Under consideration for publication in Theory and Practice of Logic Programming 



1 



Finding Similar/Diverse Solutions 
in Answer Set Programming^ 

THOMAS EITER 

Institute of Information Systems, Vienna University of Technology, Vienna, Austria 
E-mail: eiter@kr.tuwien.ac.at 

ESRA ERDEM and HALIT ERDOGAN 

Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul, Turkey 
E-mail: {esraerdem,halit}@ sabanciuniv.edu 

MICHAEL FINK 

Institute of Information Systems, Vienna University of Technology, Vienna, Austria 
E-mail: fink@kr. tuwien. ac. at 

submitted 11 January 2011; revised 27 July 2011; accepted 11 August 2011 



Abstract 

For some computational problems (e.g., product configuration, planning, diagnosis, query answering, 
phylogeny reconstruction) computing a set of similar/diverse solutions may be desirable for better 
decision-making. With this motivation, we have studied several decision/optimization versions of this 
problem in the context of Answer Set Programming (ASP), analyzed their computational complex- 
ity, and introduced offline/online methods to compute similar/diverse solutions of such computational 
problems with respect to a given distance function. All these methods rely on the idea of comput- 
ing solutions to a problem by means of finding the answer sets for an ASP program that describes 
the problem. The offline methods compute all solutions of a problem in advance using the ASP for- 
mulation of the problem with an existing ASP solver, like CLASP, and then identify similar/diverse 
solutions using some clustering methods (possibly in ASP as well). The online methods compute 
similar/diverse solutions of a problem following one of the three approaches: by reformulating the 
ASP representation of the problem to compute similar/diverse solutions at once using an existing 
ASP solver; by computing similar/diverse solutions iteratively (one after other) using an existing 
ASP solver; by modifying the search algorithm of an ASP solver to compute similar/diverse solu- 
tions incrementally. All these methods are sound; the offline method and the first online method are 
complete whereas the others are not. We have modified CLASP to implement the last online method 
and called it CLASP-NK. In the first two online methods, the given distance function is represented 
in ASP; in the last one however it is implemented in C-i~i-. We have showed the appUcabiUty and the 
effectiveness of these methods using clasp or CLASP-NK on two sorts of problems with different 
distance measures: on a real-world problem in phylogenetics (i.e., reconstruction of similar/diverse 
phytogenies for Indo-European languages), and on several planning problems in a well-known do- 
main (i.e.. Blocks World). We have observed that in terms of computational efficiency (both time and 
space) the last online method outperforms the others; also it allows us to compute similar/diverse 

* Part of the results in this paper are contained, in preliminary form, in Proceedings of the 25'th International 
Conference on Logic Programming (ICLP 2009). This work was partially supported by FWF (Austrian Science 
Funds) project P20841, the Wolfgang PauU Institute, and TUBITAK Grants 107E229 and 108E229. 
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solutions when the distance function cannot be represented in ASP (e.g., due to some mathematical 
functions not supported by the ASP solvers) but can be easily implemented in C++. 

KEYWORDS: similar/diverse solutions, answer set programming, similar/diverse phylogenies, sim- 
ilar/diverse plans 



1 Introduction 

For many computational problems, the main concern is to find a best solution (e.g., a most 
preferred product configuration, a shortest plan, a most parsimonious phylogeny) with re- 
spect to some well-described criterion. On the other hand, in many real-world applications, 
computing a subset of good solutions that are similar/diverse may be desirable for better 
decision-making. For one reason, the given computational problem may have too many 
good solutions, and the user may want to examine only a few of them to pick one. Also, 
in many real-world applications the users usually take into account furthermore criterion 
that are not included in the formulation of the optimization problem; in such cases, good 
solutions similar to a best one may also be useful. Here are some examples from several 
domains illustrating the usefulness of finding similar/diverse solutions. 



Product configuration Consider, for instance, a variation of the example given in (He- 



brard et al. 2005| l about buying a car. Suppose that there is a product advisor that asks 



customers about their constraints/preferences about a car, and then lists the available ones 
that match their constraints/preferences. However, such a list may be too long. In that case, 
the customer might ask for a few cars that not only suit her constraints/preferences but also 
are as diverse as possible. Then, if she likes one particular car among them, she might ask 
for other cars that are as similar as possible to this particular car Also, the customer may 
have other (possibly secondary) criterion that the product advisor has not asked about; and 
thus the best alternatives listed by the product advisor may not cover some of the good 
possibilities. Then, the user may ask for a couple of good enough configurations that are 
distant from a set of best configurations. 

Planning Given an initial state, goal conditions, and a description of actions, planning is 
the problem of finding a sequence of actions (i.e., a plan) that would lead the initial state 
to a goal state. Planning is applied in various domains, such as robotics, web service com- 
position, and genome rearrangement. In planning, it may be desirable to compute a set of 
plans that are similar to each other, so that, when the plan that is being executed fails, one 
can switch to a most similar one. For instance, consider a variation of the example given 
in ( |Srivastava et al. 2007| l in connection with modeling web service composition as a plan- 
ning problem ( [McIIraith and Son 2002| ): suppose that the web service engine computes 
a plan/composition; then it can compute a set of compositions similar to this particular 
one, so that if a failure occurs while executing one composition, an alternative composition 
which is less likely to be failing simultaneously can be used (Chafle et al. 2006 1. Alterna- 



tively, let us consider planning in the context of robotics in a dynamic environment with 
uncertainties. If the plan failure is, for instance, due to some collisions with an obstacle as 
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in the scenarios presented in ( Caldiran et al. 2009 1, the agent may want to find a plan that 



is distant from the previously computed plan so that it does not colhde with the obstacle 
again. 

Phylogeny reconstruction Phylogeny reconstruction is the problem of inferring a leaf- 
labeled rooted directed tree (i.e., phylogeny) that would characterize the evolutionary re- 
lations between a family of species based on their shared traits. Phylogeny reconstruc- 
tion is important for research areas as disparate as genetics, historical linguistics, zoology, 
anthropology, archaeology, etc.. For example, a phylogeny of parasites may help zool- 
ogists to understand the evolution of human diseases ( [Brooks and McLennan 199T| l; a 
phylogeny of languages may help scientists to better understand human migrations ( [White | 
[and O'Connell 1982] l. For a given set of taxonomic units, some existing phylogenetic sys- 
tems, like that of ( [Brooks et al. 2005 Brooks et al. 2007) 1, generate more than one good 



phylogeny that explains the evolutionary relationships between the given taxonomic units. 
However, usually there are too many phylogenies computed by a system, an expert needs 
to compare these phylogenies in detail, by analyzing the similar/diverse ones with respect 
to some distance measure, to pick the most plausible ones. 

Motivated by such examples, we have studied various problems related to computing 
similar/diverse solutions in the context of a new declarative programming paradigm, called 
Answer Set Programming (ASP) ( Lifschitz 2008| l. We have also introduced general of- 



fline/online methods in ASP that can be applied to various domains for such computations. 

In ASP, a combinatorial search problem is represented as an ASP program whose models 
(called "answer sets") correspond to the solutions. The answer sets for the given formal- 



ism can be computed by special systems called answer set solvers, such as CLASP (Gebser 



et al. 2007a) l. Due to the expressive formalism of ASP that allows us to represent, e.g., nega- 



tion, defaults, aggregates, recursive definitions, and due to the continuous improvements 
of efficiency of solvers, ASP has been used in a wide-range of knowledge-intensive appU- 
cations from different fields, such as product configuration ( Soininen and Niemela 1998) l, 



planning (Lifschitz 1999 1, phylogeny reconstruction ( Brooks et al. 2007) , developing a de- 



cision support system for a space shuttle (Nogueira et al. 2001 1, multi-agent planning ( Son 



et al. 200"9] l, answering biomedical queries ( Bodenreider et al. 2008 1. For many of these 



applications, finding similar/diverse solutions (and thus the methods we have developed 
for computing similar/diverse solutions in ASP) could be useful. 
The main contributions of this paper can be summarized as follows. 

• We have described mainly two kinds of computational problems related to finding 
similar/diverse solutions of a given problem, in the context of ASP (Section|3]l. Both 
kinds of problems take as input an ASP program P that describes a problem, a 
distance measure A that maps a set of solutions of the problem to a nonnegative 
integer, and two nonnegative integers n and k. One problem asks for a set 5* of size n 
that contains A;-similar (resp. fc-diverse) solutions, i.e., A(5') < k (resp. A(S') > k); 
the other problem asks, given a set S of n solutions, for a fc-close (resp. /^-distant) 
solution s (resp. s ^ S), i.e., A{S U {s}) < (resp. A(5 U {s}) > k). Note that, by 
fixing some parameters and minimizing/maximizing others, we can turn them into 
various related optimization problems. 
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We have studied the computational complexity of these decision/optimization prob- 
lems establishing completeness results under reasonable assumptions for the prob- 
lem parameters (Section[4]|. 

We have introduced an offline method to compute a set of n fc-similar (resp. k- 
diverse) solutions to a given problem, by computing all solutions in advance using 
ASP and then finding similar (resp. diverse) solutions using some clustering meth- 



ods, possibly in ASP as well (Section 5.1.1 1. This method is sound and complete. 



assuming that the ASP formulations are correct. 

We have introduced three online methods to compute a set of n fc-similar (resp. 



fc-diverse) solutions to a given problem (Sections 5.1.2 5.1.3 and 5.1.4 1 



— Online Method 1 reformulates the given program to compute ra-distinct solu- 
tions and formulates the distance function as an ASP program, so that all n 
fc-similar (resp. fc-diverse) solutions can be extracted from an answer set for 
the union of these ASP programs. 

— Online Method 2 does not modify the ASP program encoding the problem, but 
formulates the distance function as an ASP program, so that a unique fc-close 
(resp. fc-distant) solution can be extracted from an answer set for the union 
of these ASP programs and a previously computed solution; by iteratively 
computing fc-close (resp. fc-distant) solutions one after other, we can compute 
online a set of n fc-similar (or fc-diverse) solutions. 

— Online Method 3 does not modify the ASP encoding of the problem, and does 
not formulate the distance function as an ASP program, but it modifies the 



search algorithm of an ASP solver, in our case CLASP (Gebser et al. 2007b I, to 
compute all n A;-similar (or fc-diverse) solutions at once. The distance function 
is implemented in C-H-; in that sense. Online Method 3 allows for finding 
similar/diverse solutions when the distance function cannot be defined in ASP. 

All the methods are sound, assuming that the ASP formulations are correct. Online 
Method 1 is complete; however. Online Methods 2 and 3 are not because the com- 
putation of the similar/diverse solutions depend on the first solution computed by 
Clasp. 

We have illustrated the applicability of these approaches on two sorts of problems: 



phylogeny reconstruction (based on the ASP encoding of the problem as in ( Brooks 
|et a l. 2007^) and planning (based on the ASP encoding of the Blocks World as in 
( |Edem 2002j ). 

— For phylogeny reconstruction, we have defined novel distance measures for a 
set of phylogenies (Section [6?T] i, described how the offline method and the on- 
line methods are applied to find similar/diverse phylogenies (Section [6^ , and 
compared the efficiency and effectiveness of these methods on the family of 
Indo-European languages studied in ( [Brooks et al. 2007] ) (Section [63] i. Since 
there is no phylogenetic system that helps experts to analyze phylogenies by 
comparing them, this particular application of our methods also plays a signif- 
icant role in phylogenetics. In fact. Offline Method and Online Method 3 are 
integrated in the phylogenetics system Phylo-ASP ( |Erdem 2009| l. 
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— For planning, we have considered the action-based Hamming distance of (|Sri-| 



vastava et al. 2007 1 to measure the distance among plans, and compared the 



efficiency and effectiveness of the offline method and the online methods on 
some Blocks World problems (Section[7]l. 

Finding similar/diverse solutions has earlier been studied in the context of propositional 



logic (Bailleux and Marquis 1999 1, constraint programming (Hebrard et al. 2005 Hebrard 



et al. 2007 1, and automated planning ( Srivastava et al. 2007 1. These studies consider the 



Hamming distance (Hamming 1950 1 as a measure to compute distances between solutions. 
Unlike the problems studied in related work, the problems we have studied are not confined 
to polynomial-time distance functions with polynomial range. A more detailed discussion 
on related work is presented in SectionjS] 



2 Answer Set Programming 

We study finding similar/diverse solutions in the context of Answer Set Programming 
(ASP) ( [Lifschitz 2008[ ) — a new declarative programming paradigm where the idea is to 
represent a combinatorial search problem as a "program" whose models (called "answer 
sets" ( |Gelfond and Lifschitz 1991j ) correspond to the solutions. This is in the vein of SAT 
solving, which became popular after a surprising success in the area of planning ( |Kautz| 
|and Selman 1992| , but offers in comparison features like variables ranging over domain 
elements, easy definition of transitive closure, and nonmonotonic negation. Furthermore, a 
range of special constructs, such as aggregates, weight constraints and priorities, that are 
useful in practical applications are supported by various ASP solvers; for more discussion, 
see Section [8] 

Before we proceed discussing our methods for finding similar/diverse solutions in ASP, 
let us present the syntax of the kind of programs considered in this paper, and define the 
concept of an answer set for such programs]^ 

Programs The syntax of formulas, rules and programs is defined as follows. Formulas are 
formed from propositional atoms and 0-place connectives T and _L using negation (written 
as not ), conjunction (written as a comma) and disjunction (written as a semicolon). 
A rule is an expression of the form 

F ^ G (1) 

where F is an atom or _L, and G is a formula; F is called the head and G is called the body 
of the rule. A rule of the form F -(^ T will be identified with the formula F. A rule of the 
form _L -s— F (called a constraint) will be abbreviated as F. 

A (normal nested) program is a finite set of rules. If bodies of all rules in a program are 
of the form 

Al, . . . , A,n,not Am+i, ■ ■ ■ ,not An 



^ Answer sets are defined for programs of a more general form that may contain classical negation and dis- 
junction ( Gelfond and Lifschitz 1 991) and nested expressions (Lifschitz et al. 1999} in heads of rules as well. 
See jLifschitz 2010^ 1 for definitions of answer sets. 
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then the program is a nonnal program. A program is positive if it does not contain any 
negation. 

Answer Sets To define the concept of an answer set for a program, let us first define the 
satisfaction relation and the reduct of a program. 

The satisfaction relation X \— F between a set X of atoms and a formula F is defined 
recursively, as follows: 

• for an atom A, X ^ Aif A e X 

• X 

• X^± 

• X ^ {F,G) if X ^ F and X \= G 

• X \= (F; G) if X \^ F oi X \^ G 

• X \^not Fifxi^F. 

We say that X satisfies a program 11 (symbolically, X 1= 11) if, for every rule F ^ G 
inn,X\=F whenever X |= G. 

The reduct F^ of a formula F with respect to a set X of atoms is defined recursively, 
as follows: 

• if F is an atom or a 0-place connective then F-^ — F 

• {F,G)^ ^ F^,G^ 

• {F-G)^^F^-G^ 



_L , if X ^ F, 

The reduct If^ of a program 11 with respect to X is the set of rules 



• {not F)^ - . ^ . 

' T , otherwise. 



F^ ^ G^ 

for all rules F G in 11. 

Let us first define the answer set for a program 11 that does not contain negation. We 
say that X is an answer set for 11, if X is minimal with respect to set inclusion (C) among 
the sets of atoms that satisfy 11. For instance, the set {p} is the answer set for the program 
consisting of the single rule 

p^- (2) 

Now consider a program 11 that may contain negation. A set X of atoms is an answer 
set for n if it is the answer set for the reduct . For instance, the reduct of the program 

p -S— not not p (3) 

relative to {p} is (j2|. Since {p} is the answer set for (j2|, {p} is an answer set for program 
(|3]l. Similarly, {} is an answer set for program ^ as well. 

Representing a Problem in ASP The idea of ASP is to represent a computational problem 
as a program whose answer sets correspond to the solutions of the problem, and to find the 
answer sets for that program using an answer set solver 

When we represent a problem in ASP, two kinds of rules play an important role: those 
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that "generate" many answer sets corresponding to "possible solutions", and those that can 
be used to "eliminate" the answer sets that do not correspond to solutions. Rules Q are 
of the former kind: they generate the answer sets {p} and {}. Constraints are of the latter 
kind. For instance, adding the constraint 

^ P 

to program ([3]) eliminates the answer sets for the program that contain p. 
In ASP, we use special constructs of the form 

{A^....,Ar,Y (4) 
(called choice expressions), and of the form 

l<{A,,...,A,n}<u (5) 

(called cardinality expressions) where each Ai is an atom and I and u are nonnegative 



integers denoting the "lower bound" and the "upper bound" ( Simons et al. 2002 1. Programs 
using these constructs can be viewed as abbreviations for normal nested programs defined 
above, due to ( Ferraris and Lifschitz 2005[ ). For instance, the following program 



stands for program (|3]l. The constraint 

^ 2< {p,q,r} 

stands for the constraints 

^ p,q 

^ p,r 

^ q,r- 

Expression Q describes subsets of {^i, . . . ,An}. Such expressions can be used in 
heads of rules to generate many answer sets. For instance, the answer sets for the program 

{p,q,rY ^ (6) 

are arbitrary subsets of {p, q, r}. Expression (jsj describes the subsets of the set {Ai, . . . , Am} 
whose cardinalities are at least / and at most u. Such expressions can be used in constraints 
to eliminate some answer sets. For instance, adding the constraint 

4- 2 < {p,q,r} 

to program ([6| eliminates the answer sets for (|6]l whose cardinalities are at least 2. Adding 
the constraint 

^ not (1 < {p, q, r}) (7) 

to program ([6]) eliminates the answer sets for (|6]l whose cardinahties are not at least 1 . 
We abbreviate the rules 

{Ai,...,A,nV ^Body 
^not {l< {Ai,...,A,n}) 
^not {{Ai,...,Am} < u) 
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by 

I < {^1, • • ■ , Am}" < M -s- Body- 
For instance, rules (j6|, (j7| and not {{p, q, r} < 1) can be written as 

l<{p,q,rV 
whose answer sets are the singleton subsets of {p, q, r}. 

Finding a Solution using an Answer Set Solver Once we represent a computational prob- 
lem as a program whose answer sets correspond to solutions of the problem, we can use 
an answer set solver to compute the solutions of the problem. To present a program to an 
answer set solver, like CLASP, we need to make some syntactic modifications. 

The syntax of the input language of CLASP is more limited in some ways than the class 
of programs defined above, but it includes many useful special cases. For instance, the head 
of a rule can be an expression of one of the forms 

{Ai,...,A„r 

{Ai,...,AnY < u 
I < {Au...,A^Y < u 

but the superscript and the sign < are dropped. The body can contain cardinality expres- 
sions but the sign < is dropped. 

In the input language of CLASP, : - stands for and each rule is followed by a period. 

A group of rules that follow a pattern can be often described in a compact way using 
"(schematic) variables". Variables must be capitalized. For instance, the program n„ 

Pi ^ not pi+i (1 < j < n) 

can be presented to CLASP as follows: 

index ( 1 . . n) . 

p(I) :- not p(I+l), index(I). 

Here index is a "domain predicate" used to describe the range of variable I. 

Variables can be also used "locally" to describe the list of formulas in a cardinality 
expression. For instance, the rule 

1 < {Pl,---,Pn} < 1 

can be expressed in CLASP as follows 

index ( 1 . . n) . 

l{p(I) : index(I) }1. 

CLASP finds an answer set for a program in two stages: first it gets rid of the schematic 
variables using a "grounder", like GRINGO, and then it finds an answer set for the ground 
program using a DPLL-like branch and bound algorithm (outlined in Algorithm|2]l. 
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3 Computational Problems 

We study various problems related to finding similar/diverse solutions to a computational 
problem P formulated in ASP. For that, we assume that the problem is represented as 
a normal (possibly nested) program V whose answer sets characterize solutions of the 
problem. More precisely, let Sol{P) denote the set of solutions of P and let AS{'P) denote 
the set of answer sets of V. Then, there is a many-to-one mapping of AS{'P) onto Sol{P). 
Moreover, given an answer set of 7^ the corresponding solution from Sol{P) can efficiently 
be extracted. We also assume that a distance function that maps a set S of solutions to a 
number is defined, to measure how similar/diverse the solutions are in S. To this end, we 
consider set-distance measures A : 2'5°'(^) No on solutions for P. 

We are mainly interested in two sorts of problems related to computation of a diverse/similar 
collection of solutions: 

n fc-SIMILAR SOLUTIONS (resp. n fc-DIVERSE SOLUTIONS) 

Given an ASP program V that formulates a computational problem P, a distance 
measure A that maps a set of solutions for _P to a nonnegative integer, and two 
nonnegative integers n and k, decide whether a set 5* of n solutions for P exists 
such that A{S) < k (resp. A(S') > k). 

fc-CLOSE SOLUTION (resp. A;-DISTANT SOLUTION) 

Given an ASP program V that formulates a computational problem P, a distance 
measure A that maps a set of solutions for P to a nonnegative integer, a set S of 
solutions for P, and a nonnegative integer k, decide whether some solution s (s ^ S) 
for P exists such that A{S U {s}) < k (resp. A{S U {s}) > k). 

For instance, suppose that the ASP program V describes the phylogeny reconstruction 
problem for Indo-European languages as in ( Brooks et al. 2005) ; so each answer set of 



V represents a phylogeny for Indo-European languages. Using this ASP program with an 
existing ASP solver, one can compute many phylogenies for the same input dataset and 
with the same input parameters. Instead of analyzing all of these phylogenies manually, a 
historical Unguist may ask for, for instance, three phylogenies whose diversity is at least 
20 with respect to some domain-independent or domain-dependent distance function A; 
this problem is an instance of n A; -diverse solutions problem where n — 3 and k — 20. On 
the other hand, a historical linguist may have found two phylogenies Pi and P2 that are 
plausible, for instance, based on some archeological evidence, and she may want to infer a 
similar phylogeny whose distance from {Pi , ^'2} is at most 10; this problem is an instance 
of /c-close solution problem where k — 10. 

The first kind of problems above has two parameters, n and k, so we can fix one and try 
to minimize (resp. maximize) the distance between solutions to find the most similar (resp. 
diverse) solutions. 

n MOST SIMILAR SOLUTIONS (resp. n MOST DIVERSE SOLUTIONS) 

Given an ASP program V that formulates a computational problem P, a distance 
measure A that maps a set of solutions for P to a nonnegative integer, and a nonneg- 
ative integer n, find a set of n solutions for P with the minimum (resp. maximum) 
distance A(5'). 

MAXIMAL n fc-SIMILAR SOLUTIONS (resp. MAXIMAL n A;-DIVERSE SOLUTIONS) 
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Given an ASP program V that formulates a computational problem P, a distance 
measure A that maps a set of solutions for _P to a nonnegative integer, and a non- 
negative integer k, find a C-maximal set S of at most n solutions for P such that 
A(5) < k (resp. A(5') > k) exists. 

In the second class of problems, we can try to minimize (resp. maximize) the distance k 
between a solution and a set of solutions, to find the closest (resp. most distant) solution. 

CLOSEST SOLUTION (resp. MOST DISTANT SOLUTION) 

Given an ASP program V that formulates a computational problem P, a distance 
measure A that maps a set of solutions for f to a nonnegative integer, and a set S of 
solutions for P, find a solution s (s ^ S) for P with the minimum (resp. maximum) 
distance A(5' U {s}). 

We can generahze A;-CLOSE SOLUTION (resp. fc-DlSTANT SOLUTION) problems to sets 
of solutions: 

fc-CLOSE SET (resp. fc-DISTANT SET) 

Given an ASP program V that formulates a computational problem P, a distance 
measure A that maps a set of solutions for _P to a nonnegative integer, a set S of 
solutions for P, and a nonnegative integer k, decide whether a set S' of solutions 
forF(5" ^ 5) exists such that \A{S) - A{S')\ < k (resp. |A(5) - A(S")| > k). 

Usually an expert is interested in several kinds of problems to be able to systematically 
analyze solutions. For instance, a historical linguist may want to find three most diverse 
phylogenies; and after identifying one particular plausible phylogeny among them, she 
may want to compute another phylogeny that is the closest. An example of such an analysis 
is shown in Section [63] for understanding the classification of Indo-European languages. 

We note that the problems on similar/diverse solutions from above can be analogously 
defined for computation problems with multiple (or possibly none) solutions in general, 
and in particular for such problems with NP complexity. Since ASP can express all NP 
search problems ( Marek and Remmel 2003) , in fact similar/diverse solution computation 
for each such problem can be formulated in the framework above (in fact with polynomial 
overhead). 

4 Complexity Results 

Before we discuss how the computational problems described in the previous section can 
be solved in ASP, let us turn our attention to the computational complexity of the problems 
presented in Section |3] In order to do so, we first make some reasonable assumptions on 
some of the problem parameters. 

In the following we assume that given an answer set s of V, extracting a solution of P 
from s can be accomplished in time polynomial wrt. the size of s. Moreover, w.l.o.g. we 
identify s with the solution it encodes, and sets S C Sol{P) with corresponding sets of 
answer sets from AS{'P). 

We assume that all numbers are given in binary and that the given number n of different 
solutions to consider (respectively the size of the set S) for instances of the problems n 
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Table 1. Complexity results for computing similar solutions. 

#1 Problem I Complexity 



n fc-SIMILAR SOLUTIONS 

A:-CLOSE SOLUTION 

MAXIMAL n fc-SIMILAR SOLUTIONS 

n MOST SIMILAR SOLUTIONS 

CLOSEST SOLUTION 

A;-CLOSE SET 



NP 
NP 

FNP//log 
FpNP (FNP//log) 
FpNP (FNP//log) 
NP 



fe-SIMILAR SOLUTIONS, MAXIMAL n fc-SIMILAR SOLUTIONS, and n MOST SIMILAR 

SOLUTIONS is polynomial in the size of the input. The same assumption applies to the size 
of the sets S' to consider in instances of fc-CLOSE SET problems. 

Furthermore, we consider distance measures A such that deciding whether A (5) < k 
(resp. whether A(S') > k) for a given k is in NP. Moreover, we assume that the value of 
A (5) is bounded by an exponential in the size of S (and thus has polynomially many bits 
in the size of S). Thus, when considering A as an input to a problem, we assume that it 
is given as the description of a non-deterministic Turing machine M^, or M^, or both, 
where (resp. M^) nondeterministically decides A(5') < k (resp. A(5') > k) in time 
polynomial in the length of its input S and k. Consequently, a witness for a computation of 
on some input S and k, where x G {<,>}isa sequence of configurations of M^, such 
that the input tape contains 5' and k in the initial configuration, successive configurations 
correspond to transitions of M^, and the final configuration accepts. In addition, we say 
that a A is normal if < 1 impHes A(5') = 0. 

Under these assumptions, the computational complexity (cf. ( Papadimitriou 1994 1 for a 
background on the subject) of the problems concerning the computation of similar/diverse 
solutions we are interested in, is given in Table [T] All entries are completeness results 
(under usual reductions) and hardness holds even if A{S) is computable in polynomial 
time. Moreover, the results are the same for the 'symmetric' problems, i.e., when SIMILAR 
is replaced with DIVERSE, and CLOSE is replaced with DISTANT, respectively. The proofs 



are included in Appendix A 



Theorem 1 

Problem n A;-SIMILAR SOLUTIONS (resp. n A;-DIVERSE SOLUTIONS) is NP-complete. 
Hardness holds even if A(5') is computable in constant time and for any normal A. 

Membership for problem n fc-SlMlLAR SOLUTIONS (resp. n fc-DlVERSE SOLUTIONS) 
follows from the fact that we can guess not only a candidate set S via the program V (since 
S is polynomially bounded) but also a witness for A(S') < k (resp. A(S') > k), and check 
in polynomial time whether every s € S isa. solution and that A(5') < k (resp. A(5') > k). 
For hardness, one simply reduces answer-set existence for normal, propositional programs 
to this problem, which is an NP-complete problem. For a hardness result resorting to partial 
Hamming distance, one can confer ( Bailleux and Marquis 1999] l. 

In our experiments with phylogeny reconstruction, by Theorem[T] we know that deciding 
the existence of n /c-similar (resp. A;-diverse) phylogenies is NP-complete, if the distance 
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measure is the nearest neighbor interchange distance (DasGupta et al. 1997 1 whose com- 



putation is beyond polynomial time, or if the distance measure is the nodal distance or 



comparison of descendants distance (both defined in Section 6.1 1 that are computable in 
polynomial time. Also, in planning, if we consider the Hamming distance ( Hamming 1950| l 
(as defined in Section]?]), which is polynomially computable, or the edit distance involving 
transpositions, which is conjectured to be NP-hard ( Bafna and Pevzner 1998| l, deciding the 



existence of n A;-similar (resp. A;-diverse) plans is NP-complete. Therefore, it makes sense 
to find similar/diverse phylogenies/plans using ASP. 

By similar arguments we obtain NP-completeness for problem /c-CLOSE SOLUTION 

(resp. fc-DISTANT SOLUTION). 
Theorem 2 

Problem fc-CLOSE SOLUTION (resp. fc-DlSTANT SOLUTION) is NP-complete. Hardness 
holds even if A(S') is computable in constant time and for any normal A. 

When looking for maximal sets of solutions, we face a function problem; here we also 
assume a polynomial upper bound on the size of the sets S to consider (given by input 
n and our corresponding assumption). Recall that function problems generalize decision 
problems asking for a finite, possibly empty set of solutions of every problem instance. 
The solutions to function problems can be computed by transducers, i.e., possibly nonde- 
terministic Turing machines equipped with an output tape, which contains a solution if the 
input is accepted. Note that if the Turing machine is nondeterministic, then it computes 
a multi-valued (partial) function. For instance, FNP is the class of multi-valued function 
problems that can be solved by a nondeterministic transducer in polynomial time, such that 
a given solution candidate can be checked in polynomial time. 

In particular, MAXIMAL n A;-SIMILAR SOLUTIONS (resp. n MAXIMAL fc-DIVERSE SO- 
LUTIONS) is solvable in FNP //log. Intuitively, FNP //log is the class of function problems 
solvable in polynomial time using a nondeterministic Turing machine with output tape that 
may consult once an oracle that computes the optimal value of an optimization problem 
whose associated decision problem is solvable in NP, provided that this value has loga- 
rithmically many bits in the size of the input (see, e.g., ( [Chen and Toda 1995 Eiter and 



Subrahmanian 19991 for more information on FNP//log and other function classes used 



in this section). 

Theorem 3 

Problem MAXIMAL n fc-SIMILAR SOLUTIONS (resp. MAXIMAL n fc-DIVERSE SOLU- 
TIONS) is FNP//log-complete. Hardness holds even if A{S) is computable in polynomial 
time. 

Membership can be shown by computing the maximum cardinality of a set of at most 
n solutions S using the oracle. Obviously, computing the maximum cardinality c of a set 
of at most n solutions S is an optimization problem whose associated decision problem is 
the following: decide whether a given c (such that c < n) is the cardinality of a set S of 
(at most n) solutions. Since the latter problem is in NP (guess S and check in polynomial 
time whether \S\ = c, A(S') < k, and every s e 5 is a solution), the optimization problem 
is amenable to the oracle provided that the computed value (optimal c) has logarithmically 
many bits in the size of the input. Note that since \ S\is polynomially bounded in the size of 
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the input, it has logarithmically many bits as required. Once the optimal value is computed, 
one can nondeterministically compute a set S of respective size together with a witness for 
A(S') < k, and check in polynomial time that this is indeed the case. 

Hardness can be shown by a reduction of X-MinModel (cf. ( Chen and Toda 1995| l). We 



remark that the slightly different problem asking for a polynomial-size set S of solutions 
such that A (5") is minimal (respectively maximal), again under the assumption that this 
value has logarithmically many bits, is also FNP//log-complete. For this variant, hardness 
can be shown, e.g., for A(S') that takes the minimal (respectively maximal) Hamming dis- 
tance between answer sets in 5* on a subset of the atoms; note that such a partial Hamming 
distance is a natural measure for problem encodings, where the disagreement on output 
atoms is measured. 

Theorem 4 

Problem n MOST SIMILAR SOLUTIONS (resp. n MOST DIVERSE SOLUTIONS) is FP^P- 
complete, and FNP//log-complete if the value of A(S') is polynomial in the size of S. 
Hardness holds even if A (5") is computable in polynomial time. 

FP^P -membership of n MOST SIMILAR SOLUTIONS (resp. n MOST DIVERSE solu- 
tions) is obtained by first using the NP-oracle to compute the minimum distance using 
binary search (deciding polynomially many n fc-SlMlLAR SOLUTIONS problems). Then, 
the oracle is used to compute some suitable S in polynomial time. Hardness follows from 
a reduction of the Traveling Salesman Problem (TSP). Notably, if the distances are poly- 
nomial in the size of the input, i.e., if the value of A(S') is polynomially bounded in the 
size of S, then the problem is FNP//log-complete. 

Proceeding similarly as before, completeness for FP^^(resp. FNP//log if A(S') is 
small) is obtained for CLOSEST solution (and for most distant solution): 

Theorem 5 

Problem CLOSEST SOLUTION (resp. MOST DISTANT SOLUTION) is FP^^^-complete, and 
FNP//log-complete if the value of IS.{S) is polynomial in the size of S. Hardness holds 
even if A(S') is computable in polynomial time. 

For the generalization of fc-CLOSE SOLUTION (resp. of fc-DlSTANT SOLUTION) to sets, 
namely A;-close set (resp. /c-distant set), NP-completeness holds by similar argu- 
ments as for the former problem(s): 

Theorem 6 

Problem fc-CLOSE SET (resp. fc-DlSTANT SET) is NP-complete. Hardness holds even if 
A(5') is computable in constant time and for any normal A. 

Discussion The results above, summarized in Table [T] show that computing similar solu- 
tions is intractable in general. This already holds under the reasonable assumption that the 
distance measure A is normal, where all considered decision problems are NP-complete. 

The precise complexity characterization of the search problems (MAXIMAL n /^-SIMILAR 
SOLUTIONS, n MOST SIMILAR SOLUTIONS, and CLOSEST SOLUTION) reveals some in- 
formation about the type of algorithm we can expect to be suitable for solving these prob- 



lems in practice (for background, see (Chen and Toda 1995 i and references therein). In 
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particular, we may not expect that they can be solved by parallelization to NP-problems 
in polynomial time, i.e., solve in parallel polynomially many NP-problems, e.g., SAT in- 
stances, and then combine the results. On the other hand, for problem maximal n k- 
SIMILAR SOLUTIONS this is possible under randomization, i.e., with high probability of 
a correct outcome, due to the characteristics of FNP //log, while this is not the case for 
the problems n MOST similar solutions and CLOSEST solution in the general case. 
Rather, the results suggest that consecutive, dependent calls to NP oracles are needed. In- 
tuitively, backtracking-style algorithms, which explore the search space to find solutions 
and then see to (dis)prove optimality by finding better solutions, appropriately reflect adap- 
tivity. 

However, from a worst-case complexity perspective, a simple realization of such a scheme 
may not be optimal, as far too many solution improvements (exponentially resp. polyno- 
miaUy many under "small" distance values) may happen until an optimal solution is found; 
here a two phase algorithm (first compute the optimal solution cost in binary search and 
then a solution of that cost, e.g., with backtracking) gives better guarantees. In practice, 
one may intertwine bound and solution computation and conduct a binary search over 
computations of solutions within a given bound. 

In the next section, we consider first solving the search problem analog of the decision 
problem n k SIMILAR SOLUTIONS, using different approaches, ranging from declarative 
encodings in ASP over the explicit respectively impUcit set of solutions, to a generalized 
backtracking algorithm for evaluation ASP programs. We then consider solving the related 
search problems n MOST SIMILAR SOLUTIONS and MAXIMAL n fc-SIMILAR SOLUTIONS 
based on the above considerations. Finally, we discuss how we can solve the problems 
fc-CLOSE SOLUTION, CLOSEST SOLUTION and fc-CLOSE SET Utilizing the methods intro- 
duced for n k SIMILAR SOLUTIONS and its variants. 

5 Computing Similar/Diverse Solutions 

Now we have a better understanding of the computational problems, let us present our com- 
putational methods to find n fc-similar/diverse solutions, n most similar/diverse solutions 
and maximal n fc-similar/diverse solutions for a given computational problem P. Since the 
computation of similar solutions and diverse solutions are synmietric, for simplicity, let us 
only focus on the problems related to similarity. In the following, suppose that the problem 
P is described by an ASP program Solve . Ip. 

5.7 Computing n k- Similar Solutions 

To compute a set of n solutions whose distance is at most k, we introduce an offline method 
and three online methods. 

5.7.7 Offline Method 

In the offline method, we compute the set S of all the solutions for P in advance using 
the ASP program Solve . Ip, with an existing ASP solver. Then, we use some clustering 
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Algorithm 1 Offline Method 

Input: A set S of solutions, a distance function d : S x S ^ 1^, and two nonnegative 

integers n and k. 
Output: A set C of n solutions whose distance is at most k. 

V ■<— Define a set of vertices, each denoting a unique solution in S; 

E — {{vi, Vj} I Vi ^ Vj, Vi, Vj denote Si, sj G S, d{si, Sj) < k}; 
C -s— Find a clique of size n in {V, E); 
return C 



method to find similar solutions in S. The idea is to form clusters of n solutions, measure 
the distance of each cluster, and pick the cluster whose distance is less than or equal to k. 

We can compute clusters of n solutions whose distance is at most k by means of a graph 
problem: build a complete graph G whose nodes correspond to the solutions in S and 
edges are labeled by distances between the corresponding solutions; and decide whether 
there is a clique C of size n in G whose weight (i.e., the distance of the set of solutions 
denoted by the weight of the clique) is less than or equal to k. The set of vertices in the 
clique represents n A;-similar solutions. 

The weight of a clique (or the distance A of the solutions in the cluster) can be computed 
as follows: Given a function d to measure the distance between two solutions, let A{S) 
be the maximum distance between any two solutions in S. Then n fc-similar solutions can 
be computed by Algorithm [T] where the graph G is built as follows: nodes correspond to 
solutions in S, and there is an edge between two nodes Si and S2 in G if rf(si, S2) < k. 
Nodes of a clique of size n in this graph correspond to n fc-similar solutions. Such a clique 
can be computed using the ASP formulation in ( Lifschitz 2008[ l, or one of the existing 
exact/approximate algorithms discussed in ( Gutin 2003 1. 



5.7.2 Online Method 1: Reformulation 

Instead of computing all the solutions in advance as in the offline method, we can com- 
pute n A;-similar solutions to the given problem P on the fly. First we reformulate the 
ASP program Solve . Ip in such a way to compute 77.-distinct solutions; let us call the 
reformulation as SolveN . Ip. Such a reformulation can be obtained from Solve . Ip as 
follows: 

1. We specify the number of solutions: solution ( 1 . . n) . 

2. In each rule of the program Solve . lp,wereplaceeach atomp (Tl, T2, . . . , Tm) 
(except the ones specifying the input) with p(N,Tl,T2. . .,Tm), and add to the 
body solution (N) . 

3. Now we have a program that computes n solutions. To ensure that they are distinct, 
we add a constraint which expresses that every two solutions among these n solu- 
tions are different from each other. 

Next we describe the distance function A as an ASP program. Distance . Ip. In addi- 
tion, we represent the constraints on the distance function (e.g., the distance of the solutions 
in S is at most k) as an ASP program Constraint . Ip. Then we can compute 77. -distinct 
solutions for the given problem P that are fc-similar, by one call of an existing ASP solver 
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SolveN.lp (Computes a set S of n solutions to the given problem P) 
Distance.lp (Computes the distance of S) 

Constraint.lp (Eliminates the sets of solutions whose distance is greater than k) 



n k-similar solutions 



Fig. 1. Computing n fc-similar solutions, with Online Method 1. 



with the program SolveN . Ip U Distance.lp U Constraint . Ip, as shown in 
Fig.[T] Let us give an example to illustrate Online Method 1. 

Example 1 

Suppose that we want to compute n A;-similar cliques in a graph. Assume that the similarity 
of two cliques is measured by the Hamming Distance: the distance between two cliques C 
and C is equal to the number of different vertices, (C \ C) U (C \ C). The distance of 
a set S of cliques can be defined as the maximum distance among any two cliques in S. 
The clique problem can be represented in ASP (Solve . Ip) as in ( Lifschitz 2008| l, also 



shown in [Appendix B| (Fig. |B The reformulation (SolveN . Ip) of this ASP program 
as described above can be seen in Fig. |B 2| of [Appendix B| This reformulation computes n 
distinct cliques. 

The Hamming Distance between any two cliques can be represented by the ASP program 



(Distance . Ip) shown in Fig. B 3 of Appendix B 



Finally, Fig. B 4 shows the constraint (Constraint . Ip) that eliminates the sets whose 
distance is above k. 

An answer set for the union of these three programs, SolveN . Ip U Distance . Ip 
U Constraint . Ip, corresponds to n fc-similar cliques. 



5.7. i Online Method 2: Iterative Computation 

This method does not modify the given ASP program Solve . Ip as in Online Method 
1, but still formulates the distance function and the distance constraints as ASP programs. 
The idea is to find similar/diverse solutions iteratively, where the i'th solution is fc-close 
to the previously computed i — \ solutions (Fig.|2]). Here n iterations lead to n solutions 
whose distance is at most k (i.e., n fc-similar solutions). 

Note that, like Offline Method and Online Method 1, this method is sound; however, 
unlike Offline Method and Online Method 1, it is not complete since computation of each 
solution depends on the previously computed solutions. The method may not return n k- 
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A sets of k-similar 
solutions 



Solve.lp (Computes a solution s to the given problem P) 

Distance. Ip (Computes the distance of S U {s}) 

Constraint.lp (Eliminates the solutions that are not k-close to S) 



I 



k-close solution 



Fig. 2. Computing n fc-similar solutions, with Online Method 2. Initially 6* = 0. In each 
run, a solution is computed and added to S, until jS"! = n. The distance function and the 
constraints in the program ensures that when we add the computed solution to S, the set 
stays A;-similar 

similar solutions (even it exists) if the previously computed solutions comprise a bad solu- 
tion set. 



5.1.4 Online Method 3: Incremental Computation 

This method is different from the other two online methods in that it does not modify 
the ASP program Solve . Ip describing the given computational problem P, it does not 
formulate the distance function A and the distance constraints as ASP programs. Instead, 
modifies the search algorithm of an existing ASP solver in such a way that the modified 
ASP solver can compute n fc-similar solutions (Fig. |3]l. In this method, we modify the 
search algorithm of the ASP solver CLASP (Version 1.1.3); the modified version is called 
CLASP-NK. The given distance measure A is implemented as a C++ program. 

Let us describe how we modified clasp to obtain clasp-nk. clasp performs a DPLL- 
like ( [Davis et al. 1962[|Marques-Silva and Sakallah 1999| l branch and bound search to find 
an answer set for a given ASP program (Algorithm |2|: at each level, it "propagates" some 
literals to be included in the answer set, "selects" new Uterals to branch on, or "back- 
tracks" to an earlier appropriate point in search while "learning conflicts" to avoid redun- 
dant search. 

We modify clasp's algorithm as shown in Algorithm|3]to obtain CLASP-NK: the under- 
lined parts show these modifications. To use CLASP-NK, one needs to prepare an options 
file, NKoptions, to describe the input parameters to compute n /^-similar phylogenies, such 
as the values n and k, along with the names of predicates that characterize solutions and 
that are considered for computing the distance between solutions. Note that since an an- 
swer set (thus a solution) is computed incrementally in CLASP-NK, we cannot compute the 
distance between a partial solution and a set of solutions with respect to the given distance 
function A. Instead, one needs to implement a heuristic function to estimate a lower bound 
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Solve.lp (Computes a solution to the given problem P) 
Distance.cpp (Computes the distance of a set of solutions) 
I Clasp-nk / 



n k-similar solutions 



Fig. 3. Computing n /c-similar solutions, with Online Method 3. CLASP-NK is a modifica- 
tion of the ASP solver CLASP, that takes into account the distance function and constraints 
while computing an answer set in such a way that CLASP-NK becomes biased to compute 
similar solutions. Each computed solution is stored by CLASP-NK until a set of n fc-similar 
solutions is computed. 



Algorithm 2 CLASP 
Input: An ASP program 11 
Output: An answer set A for 11 

^ // current assignment of literals 
V // set of conflicts 
while No Answer Set Found do 

PROPAGATlON(n, A, v) // propagate literals 
if There is a conflict in the current assignment then 

RESOLVE-CONFLlCT(n, A, xj) II learn and update conflicts, and backtrack 
else 

if Current assignment does not yield an answer set then 

SELECT(n, A, v) select a literal to continue search 
else 

return A 
end if 
end if 
end while 



for the distance between any completion s of a partial solution with a set S of previously 
computed solutions. If this heuristic function is admissible then it does not underestimate 
the distance of 5* U {s} (i.e., it returns a lower bound that is less than or equal to the optimal 
lower bound for the distance). 

Note that similar to Online Method 2, this method is also sound but not complete since 
each solution depends on all previously computed solutions. 
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Algorithm 3 CLASP-NK 

Input: An ASP program H, nonnegative integers n, and k 

Output: A set X of n solutions that are k similar {n fc-similar solutions) 

^ -i— // current assignment of literals 

V // set of conflicts 

X ^ // computed solutions 

while |X| < n do 
PartialSolution •;— A 

LowerBound ^ DISTANCE- AN ALYZE{X ,PartialSolution) II compute a lower 

bound for the distance between any completion of a partial solution and the set of 

previously computed solutions 

PROPAGATiON(n, A, v) // propagate literals 

if Conflict in propagation OR LowerBound > k then 

RESOLVE-CONFLlCT(n, A, v) // learn and update conflicts, and backtrack 
else 

if Current assignment does not yield an answer set then 

SELECT(n, A, v) // select a Uteral to continue search 
else 

X ^ X(J{A} 
A^(/} 
end if 
end if 
end while 
return X 



5.2 Computing n Most Similar Solutions 

In the previous sections, we have described some computational methods to solve the 
decision problem n fc-SlMlLAR SOLUTIONS. Let us discuss how we can solve the opti- 
mization problem n MOST SIMILAR SOLUTIONS. Let NK Similar [n, k) be a function that 
returns — with one of the methods described in the previous subsections — a set S of n so- 
lutions which is A;-similar; or returns empty set if no such set exists. Using this function, 
we can find n most similar solutions as follows: First we compute a lower bound and an 
upper bound for the distance of a set of n solutions. Then, we perform a binary search 
within these bounds to find a set S of solutions with the optimal value for k. Computations 
of a lower bound and an upper bound are usually specific to the particular problem. For 



instance, consider the clique problem described in Section 5.1.2 We can find two most 
similar cliques in a graph, specifying the lower bound as and the upper bound as the 
number of vertices in the graph and using one of the methods described above. 



5.3 Computing Maximal n k-Similar Solutions 

Another optimization problem we are interested in is MAXIMAL n fc-SlMlLAR SOLU- 
TIONS, which asks for a maximal set of solutions whose distance is at most k. We can 
solve this problem by modifying Online Method 2; start with a solution (computed using 
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Solve . Ip), then repeatedly find a solution which is /c-close to the previously found so- 
lutions until there does not exists such a solution. Recall that Online Method 2 iterates n 
times; here the iterations continue until no A; -close solution is found. Since Online Method 
2 is incomplete, this method of computing maximal n /c-similar solutions is incomplete as 
well. 



5.4 Computing Close/Distant Solutions 

We can solve the problem A;-CLOSE SOLUTION utiHzing the methods for n fc-SlMlLAR 
SOLUTIONS. For instance, we can modify Online Method 2: start with a set S of solutions, 
then find a solution which is fc-close to S. Based on this modified method, we can solve 
the problem CLOSEST SOLUTION: we can compute a lower bound and an upper bound for 
k, and find the optimal value for by a binary search between these bounds as described 
in the method for n MOST SIMILAR SOLUTIONS. 

Alternatively, for fc-CLOSE SOLUTION, we can modify the ASP program V (Solve . Ip) 
that describes the computational problem P, by adding constraints, to ensure that the an- 
swer sets for V characterize solutions for P except for the ones included in the given set 
S of solutions. Let us call the modified ASP program V' . Next, we define a distance mea- 
sure A' that maps a set of solutions for P to a nonnegative integer, in terms of the given 
measure A as follows: A'(X) ~ A{S U X). Then, we can use one of the computational 
methods introduced for n fc-SlMlLAR SOLUTIONS with the ASP program V', the distance 
function A' and n — 1. In a similar way, we can find a solution to the problem CLOSEST 
SOLUTION utilizing the computational method for n MOST SIMILAR SOLUTIONS, with the 
ASP program V', the distance measure A' and n = 1. 

We can solve the problem fc-CLOSE SET using one of the computational methods for 
n A;-SIMILAR SOLUTIONS as well. For instance, we can use Online Method 1 with n — 
1,2,3, m, where m is an upper bound on the number of solutions for P, until a k- 
close set S' of solutions is computed. For each n, we reformulate the ASP program V 
to compute a set S' of n solutions and add a constraint to this reformulation to ensure 
that S' S when n — 1 5" |; let us call this modified reformulation Vn- Then we try to 
find a A;-close set 5" of n solutions with the ASP program Vn, and the distance measure 
A" = |A(5)- A(5')|. 

Alternatively, we can use Online Method 2 or 3 with the ASP program V, with the 
distance measure 

,„ ^ r oo s = s' 

\ |A(5)-A(5')| otherwise 
and with n = 1,2,3, m where m is an upper bound on the number of solutions for P. 



6 Computing Similar/Diverse Phylogenies 

Let us now illustrate the usefulness of our methods in a real-world application: reconstruc- 
tion of evolutionary trees (or phylogenies) of a set of species based on their shared traits. 
This problem is important for research areas as disparate as genetics, historical linguistics, 
zoology, anthropology, archeology, etc.. For example, a phylogeny of parasites may help 
zoologists to understand the evolution of human diseases (Brooks and McLennan 1991 1; a 
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phylogeny of languages may help scientists to better understand human migrations (White 
land O'Connell 1982| i 



There are several software systems, such as phylip ( Felsenstein 2009| l, paup (Swof- 



ford 2003^ or Phylo-ASP ( |Erdem 2009| , that can reconstruct a phylogeny for a set of 



taxonomic units, based on "maximum parsimony" ( |Edwards and Cavalli-Sforza 1964| l or 
"maximum compatibility" ( jCamin and Sokal 1965| l criterion. With some of these systems, 
such as Phylo-ASP, we can compute many good phylogenies (most parsimonious phy- 
logenies, perfect phylogenies, phylogenies with most number of compatible traits, etc.) 
according to the phylogeny reconstruction criterion. In such cases, in order to decide the 
most "plausible" ones, domain experts manually analyze these phylogenies, since there is 
no available phylogenetic system that can analyze/compare these phylogenies. 

For instance, Phylo-ASP computes 45 plausible phylogenies for the Indo-European 
languages based on the dataset of ( [Brooks et al. 20071 . In order to pick the most plausible 



phylogenies, in ( Brooks et al. 2007 1, the historical linguist Don Ringe analyzes these phy- 
logenies by trying to cluster these phylogenies into diverse groups, each containing similar 
phylogenies. In such a case, having a tool that reconstructs similar/diverse solutions would 
be useful: with such a tool, an expert can compute (instead of computing all solutions) few 
most diverse solutions, pick the most plausible one, and then compute phylogenies that are 
close to this phylogeny. 

Let us show how our methods can be used for this purpose. Before that, we define a 
phylogeny and some distance functions to measure the similarity/diversity of phylogenies. 



6.1 Distance Measures for Phylogenies 

A phylogeny for a set of taxa is a finite rooted leaf-labeled binary directed tree {V, E) 
with a set L of leaves (L C V). The set L represents the given taxonomic units, whereas 
the set V describes their ancestral units and the set E describes the genetic relationships 
between them. The labelings of leaves denote the values of shared traits at those nodes. 
We consider distance measures that depend on topologies of phylogenies, therefore, while 
defining them we discard these labelings. 



There are various measures to compute the distance between two phylogenies ( Nye et al. 



[2006 ; Robins on and Foulds 1981 |[Hon et al. 2000[|Kuhner and Felsenstein 1994l|DasGupta 



[et al. 1997). In the following, first we consider one of these domain-independent functions. 



the nodal distance measure (Bluis and Shin 2003 i, to compare two phylogenies; and then 



we define a distance measure for a set of phylogenies based on the nodal distances of 
pairwise phylogenies, to show the applicability of our methods for finding n A;-similar 
phylogenies. Then we define a novel distance function that measures the distance of two 
phylogenies, and a distance function that measures the distance of a set of phylogenies, 
taking into account some expert knowledge specific to evolution. With this measure we 
also show the effectiveness of our methods. 



6.1.1 Nodal Distance of Two Phylogenies 

The nodal distance NDp{x, y) of two leaves x and y in a phylogeny P is defined as fol- 
lows; First, transform the phylogeny P (which is a directed tree) to an undirected graph G 
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Pi P2 




Fig. 4. Two phylogenies Pi = {a, (6, c)) and P2 = (&, (a, c)) 



where there is an undirected edge in the graph for each directed edge (i, j) in the 

phylogeny. Then NDp{x, y) is equal to the length of the shortest path between x and y 
in the undirected graph G. For example, consider the phylogeny. Pi in Fig. [4j the nodal 
distance between a and b is 3, whereas the nodal distance between b and c is 2. Intu- 
itively, the nodal distance between two leaves in a phylogeny represents the degree of their 
relationship in that phylogeny. 

Given two phylogenies Pi and P2 both with same set L of leaves, the nodal distance 
Dn{Pi, P2) of two phylogenies is calculated as follows: 

Dn{Pi,P2)^ \NDp,{x,y)-NDp,{x,y)\- 

Here the difference of the nodal distances of two leaves x and y represents the contribution 
of this pair of leaves to the distance between the phylogenies. 

Proposition 1 

Given two phylogenies Pi and P2 with same set L of leaves and the same leaf-labeling 
function, Dn{Pi, P2) can be computed in 0(|Lp) time. 

Example|2]shows an example of computing the nodal distance between two phylogenies. 
In that example, we suppose that the phylogenies are presented in the Newick format, 
where the sister sub-phylogenies are enclosed by parentheses. For instance, the first tree, 
Pi,ofFig.|4] can be represented in the Newick format as (a, (&, c)). 

Example 2 

In order to compute the nodal distance D„ (Pi , P2) between the phylogenies Pi = (a, ( &, c)) 
and P2 = (&, (a, c)) shown in Fig.|4] we compute the nodal distances of the pairs of leaves, 
{a, 6}, {a, c} and {6, c}, and take the sum of the differences; 
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Pairs of leaves Distance in Pi Distance in P2 Difference 



{a,b} 3 3 

{a,c} 3 2 1 

{b,c} 2 3 1 



Total distance 
In this case the distance between Pi and P2 is 2. 



6.1.2 Descendant Distance of Two Phytogenies 

Nodal distance measure computes the distance between two rooted binary trees and does 
not consider the evolutionary relations between nodes. In that sense, it is a domain-independent 
distance measure for comparing phylogenies. A distance measure that takes into account 
these relations and might give more accurate results. Therefore, we define a new distance 
function based on our discussions with the historical linguist Don Ringe. In particular, we 
take into account the following domain-specific information in phylogenetics: the similar- 
ities of phylogenies towards their roots are more significant; and thus two phylogenies are 
more similar if the diversifications closer to their roots are more similar. 

For each vertex of a tree T = {V, E), let us define the descendants of x as follows: 



descT(v) = I 

[ desc 



V is a leaf in V 

descT{u) U descriu') otherwise {v, u), {v, u') ^ E,u ^ u' 



and the depth of a vertex v as follows: 
depthrp[v) - 



w is the root of T 

1 -|- depthrp{u) otherwise(M, v) G E- 

To define the similarity of two phylogenies T = {V ,E) waA T' = { V, E'), let us first 
define the similarity of two vertices v & V and v' & V: 



f{v,v') 



1 desc T {v) ^ desc T'{v') 
otherwise 



For every depth i (0 < i < min{ma.Xy^v depth j,{v),maxyi^vi depth j,,{v')}), let us 
also define a weight function weight{i) that assigns a number to each depth i. The idea 
is to assign bigger weights to smaller depths so that two phylogenies are more similar if 
the diversifications closer to the root are more similar. This is motivated by the fact that 
reconstructing the evolution of languages closer to the root is more important for historical 
linguists. 

Now we can define the similarity of two trees T = {V,E) and T' = {V, E'), with the 
roots R and R' respectively, at depth iiO < i < min{max„g y depth{v), max^'^v' depth{v')}), 
by the following measure: 

g{0,T,r)= weight{Q)xf{R,R') 
g{i,T,T')= g{i-l,T,T')+ 

weight{i) X j:^^v,yev',depth^i.)=depth^,iy)=ifi^^ V)' « > « 
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and the similarity of two trees as follows: 

Di{T, T') ~ g{mm{iiiaxdepthj'{v), max depth rp,{v'Y\, T, T')- 

Proposition 2 

Given two trees Pi and P2 with same set L of leaves and the same leaf-labeling function, 
Di{Pi,P2) can be computed in 0(|Lp) time. 

Example [3] shows an example of computing the distance between two trees shown in 
Fig.g 

Example 3 

In order to compute the descendant distance Di{Pi, P2) between the phylogenies Pi = 
(a, (b, c)) and P2 — {h, (a, c)) shown in Fig. |4j for each depth level, we multiply the 
number of vertices that have different descendants with the weight of that depth level. 
Then, we add up the products to find the total distance between Pi and P2. 

Depth Weight of Depth i Number of pairs of vertices that 

have different descendant sets 



2 

1 1 4 

2 3 



Distance = 2x0 + 1x4 + 0x3 = 4 
The descendant distance between Pi and P2 is 4. 

6.1.3 Distance of a Set of Phylogenies 

In the previous subsections, we defined distance functions for measuring the distance be- 
tween two phylogenies. However, the problems that we defined in Section [3] requires a 
distance function that measures the distance of a set of phylogenies. We can define the 
distance of a set of phylogenies based on the distances among pairwise phylogenies. For 
instance, the distance of a set 5* of phylogenies can be defined as the maximum distance 
among any two phylogenies in S. 

Let D be one of the distance measures defined in the previous subsection. Then, to be 
able to find similar phylogenies, the distance of a set S of phylogenies (A/j) is defined as 
follows: 

^D{S)=nis.x{D{Pi,P2) I Pi,P2 e Sy 

To be able to find diverse phylogenies, the distance of a set S of phylogenies (A^)) is 
defined as follows: 



AD{S)=nim{D{Pi,P2) I Pi,P2 e sy 
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6.2 Computing n k-Similar/Diverse Phylogenies 



Analogous to the n fc-similar (resp. diverse) solutions, we define the n fc-similar (resp. 
diverse) phylogenies as follows: 

n /^-SIMILAR PHYLOGENIES (RESP. n fc-DIVERSE PHYLOGENIES) 

Given an ASP program V that formulates a phylogeny reconstruction problem P, a 
distance measure A o that maps a set of phylogenies for P to a nonnegative integer, 
and two nonnegative integers n and fc, decide whether a set S of n phylogenies exists 
such that Ad{S) < k (resp. Ad{S) > k). 

Recall that in order to compute n fc-similar (resp. diverse) solutions we need an ASP 
program that computes a solution and a distance measure. We consider the ASP program 
phylogeny-improved . Ip described in ( Brooks et al. 2007) as our main program 
that computes a phylogeny; this program is shown Fig.s B 5 and B 6 in Appendix B We 



represent the nodal distance _D„ (resp. the descendant Di) of two phylogenies as the ASP 



program in Fig. B 10 (resp. Figs. B 11 and B 12i in Appendix B In addition, we consider 



the program in Fig. B 13 that computes the total distance of a set of solutions with A o and 



eliminates the ones whose total distance is greater than k. 

For Offline Method, we compute all the phylogenies using phylogeny-improved . Ip. 
Then we build a graph of phylogenies as in Subsection |5. 1.1 1 Then, we use the ASP pro- 
gram in Fig. |B 1 1 in [Appendix B| to find a clique of size n in the constructed graph. This 
clique corresponds to n fc-similar phylogenies. 

For Online Method 1, we reformulate the main program phylogeny-improved . Ip 



to obtain a program that computes n distinct phylogenies as in Section 5. 1.2 The reformu- 
lation is shown in Fig.s B 7 B 9 in Appendix B 



For Online Method 3, we define a heuristic function to estimate a low bound for the 
distance between any completion of a given partial phylogeny and a complete phylogeny. 
Let Pc be any complete phylogeny, Pj, be any partial phylogeny and Lp be the set of 



pairs of leaves that appear in Pp . Consider the nodal distance (Section 6. 1 . 1 1 for comparing 
two phylogenies. Then we can define a lower bound as follows: 



E 



\NDpJx,y)~NDpJx,y)\- 



This lower bound does not overestimate the distance between a phylogeny and any com- 
pletion of a partial phylogeny. 

Proposition 3 

Given a partial phylogeny Pp and a complete phylogeny Pc, CBn{Pp, Pc) is admissible, 
i.e., for every completion P of Pp, 

CBn{Pp,Pc) < D„iP,Pc)- 

Similarly, we can define an upper bound for the differences of nodal distances measure as 
follows: 



UBniPp,Pc)^ J2 \NDpXx,y)-NDp^{x,y)\ + { 

x,yGLp 



) X /• 
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where I denotes the number of leaves in the complete tree. 

This upper bound does not underestimate the distance between a phylogeny and any 
completion of a partial phylogeny. 

Proposition 4 

Given a partial phylogeny Pp and a complete phylogeny Pc, UBn{Pp, Pc) is admissible, 
i.e., for every completion P of Pp, UBn{Pp, Pc) > Dn{P, Pc)- 

As regards the comparison of descendants distance measure, we could not find a tight 
lower and upper bounds. In our experiments, we consider that the lower bound (resp. upper 
bound) between a complete phylogeny and any completion of a partial phylogeny is 
(resp. cxi). 



6.3 Experimental Results for Phylogeny Reconstruction 



We applied the offline method and the online methods described in Section 5.1 



to recon- 



struct similar/diverse phylogenies for Indo-European languages. We used the dataset as- 



sembled by Don Ringe and Ann Taylor (Ringe et al. 2002 1. As in (Brooks et al. 2007 1, 
to compute similar/diverse phylogenies, we considered the language groups Balto-Slavic 
(BS), Italo-Celtic (IC), Greco-Armenian (GA), AnatoUan (AN), Tocharian (TO), Indo- 
Iranian (IIR), Germanic (GE), and the language Albanian (AL). While computing phylo- 
genies, we also took into account some domain-specific information about these languages. 

In our experiments, we considered the distance measures described in Section 6.1 as in 
Sectioning 



Below all CPU times are in seconds, for a workstation with a 1.5GHz Xeon processor 
and 4x5 12MB RAM, running Red Hat Enterprise Linux (Version 4.3). 



Experiments with the Nodal Distance Let us first examine the results of experiments, con- 
sidering the distance measure A„, based on the nodal distance (Table |2]). We present the 
results for the following computations: 2 most similar solutions, 2 most diverse solutions, 
3 most similar solutions, 3 most diverse solutions, 6 most similar solutions. We solve these 
optimization problems by iteratively solving the corresponding decision problems (n k- 
SIMILAR/diverse solution). For each method, we present the computation time, the 
size of the memory used in computation, and the optimal value of k. 

Let us first compare the online methods. In terms of both computation time and memory 
size. Online Method 3 performs the best, and Online Method 2 performs better than Online 
Method 1. These results conforms with our expectations. Online Method 1 takes as input 
an ASP representation of computing n fc-similar/diverse phylogenies, which is almost n 
times as large as the ASP program describing the phylogeny reconstruction problem used 
in other methods. Therefore, its computational performance may not be as good as the other 
online methods. Online Method 2 relaxes this requirement a little bit so that the answer set 
solver can compute the solutions more efficiently: it takes as input an ASP representation 
of phylogeny reconstruction, and an ASP representation of the distance measure, and then 
computes similar/diverse solutions one at a time. However, since the answer set solver 
needs to compute the distances between every two solutions, the computation time and 
the size of memory do not decrease much, compared to those for Online Method 1. Online 
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Table 2. Computing similar/diverse phylogenies using the nodal distance A„. 



Problem 



Offline Method 



Reformulation 

(CLASP) 



Online Methods 
Iterative Comp. 
(CLASP, perl) 



Incremental Comp. 
(CLASP-NK) 



2 most similar 
{k = 12) 



12.39 sec. 
32MB 

fc = 12 



26.23 sec. 
430MB 

k = 12 



19.00 sec. 
410MB 

k = 12 



1.46 sec. 
12MB 

k = l2 



2 most diverse 

{k = 32) 



11.81 sec. 
32MB 

k = 32 



21.75 sec. 
430MB 

k = 32 



18.41 sec. 
410MB 

A; = 24 



1.01 sec. 
15MB 

k = 32 



3 most similar 

(k = 15) 



11.59 sec. 
32MB 

fc = 15 



60.20 sec. 
730MB 

A; = 15 



43.56 sec. 
626MB 

k = 15 



1.56 sec. 
15MB 

fc = 16 



3 most diverse 
{k = 26) 



11.91sec. 
32MB 

fc = 26 



46.32 sec. 
730MB 

= 26 



44.67 sec. 
626MB 

k = 21 



0.96 sec. 
15MB 

fc = 26 



6 most similar 

{k = 25) 



1 1.66sec. 
32MB 

= 25 



327.28 sec. 
1.8GB 

A; = 25 



178.96 sec. 
1.2GB 

A: = 29 



1.96 sec. 
15MB 

k = 25 



Method 3 deals with the time consuming computation of distances between solutions, not at 
the representation level but at the search level. In that sense, its computational performance 
is better than Online Method 2. 

The offline method takes into account the previously computed 8 phylogenies for Indo- 
European languages (with at most 17 incompatible characters), and computes similar/diverse 
solutions using ASP as explained in Section |3] The offline method is more efficient, in 
terms of both computation time and memory, than Online Methods 1 and 2 since it does 
not compute phylogenies. On the other hand, the offline method is less efficient, in terms 
of both computation time and memory, than Online Method 3, since it requires both repre- 
sentation and computation of distances between solutions. 

Here both the offline method and Online Method 1 guarantee finding optimal solutions 
by iteratively solving the corresponding decision problems. On the other hand. Online 
Methods 2 and 3 compute similar/diverse solutions with respect to the first computed so- 
lution, and thus may not find the optimal value for k, as observed in the computation of 3 
most similar phylogenies. 

Experiments with the Nodal Distance Now, let us consider the distance measures A;, 
based on preference over diversifications (Table [3]i: two phylogenies are more similar if 
the diversifications closer to the root are more similar. Here we consider the similarities 
of diversifications until depth 3 (inclusive). We present the results for the following com- 
putations; 2 most similar solutions, 3 most diverse solutions, 6 most similar solutions. In 
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Table [3] for each method, we present the computation time, the size of the memory used 
in computation, and the optimal value of k. Unlike what we have observed in Table |2j the 
offline method takes more time/space to compute similar/diverse solutions; this is due to 
the computation of distances with respect to A; which requires summations. Other results 
are similar to the ones presented in Table [2] 

Accuracy Let us compare the phylogenies computed by different distance measures in 
terms of accuracy. In ( Brooks et al. 2007] l, after computing all 34 plausible phylogenies, 
the authors examine them manually and come up with three forms of tree structures, and 
then "filter" the phylogenies with respect to these tree structures. For instance, in Group 1, 
the trees are of the form 

(AN, (TO, (AL, (IC, (a tree formed for GE, GA, BS, IIR))))); 
in Group 2, the trees are of the form 

(AN, (TO, (IC, (a tree formed for GE, GA, BS, IIR, AL)))); 
in Group 3, the trees are of the form 

(AN, (TO, ((AL, IC), (a tree formed for GE, GA, BS, IIR)))). 

The results of our experiments with the distance measure A; comply with these groupings. 
For instance, the 2 most similar phylogenies computed by Online Method 1 are in Group 1 ; 

(AN, (TO, (IC, ((GE, AL), (GA, (IIR, BS)))))), 
(AN, (TO, (IC, ((GE, AL), (BS, (IIR, GA)))))), 



Phylogenies 7 and 8 of ( Brooks et al. 200"7) l; both are in Group L The 3 most diverse 



phylogenies computed by Online Method 2 are 



(AN, (TO, (IC, (AL, (GE, (GA, (IIR, BS))))))), 
(AN, (TO, (AL, (IC, (GE, (GA, (IIR, BS))))))), 
(AN, (TO, ((GE, (GA, (IIR, BS))), (AL, IC)))), 



Phylogenies 10, 1, 40 of ( Brooks et al. 2007) 1; all in different groups. Likewise, the 6 similar 



phylogenies computed by our methods fall into Group 2. 

These results (in terms of computational efficiency and accuracy) show the effective- 
ness of our methods in phylogeny reconstruction: we can automatically compare many 
phylogenies in detail. 



7 Computing Similar/Diverse Plans 

In order to show the applicability and effectiveness of our methods to other domains, we 
extend our experiments further with the Blocks World planning problems. 

In these experiments, we study the following instance of n fc-SlMlLAR SOLUTIONS 
(resp. n A:-DIVERSE SOLUTIONS): 
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Table 3. Computing similar/diverse phylogenies using the descendant distance A;. 



Problem 



Offline Method 



Reformulation 

(CLASP) 



Online Methods 
Iterative Comp. 
(CLASP, perl) 



Incremental Comp. 
(CLASP-NK) 



2 most similar 
{k = 18) 



365.16 sec. 
4.2GB 

fc = 18 



16.11 sec. 
236MB 

A; = 18 



16.23 sec. 
212MB 

k = 18 



0.635 sec. 
22MB 

fc = 18 



3 most diverse 

{k = 20) 



368.59 sec. 
4.2GB 

fc = 20 



46.11 sec. 
659MB 

A; = 20 



44.21 sec. 
430MB 

A; = 20 



1.014 sec. 
22MB 

fc = 20 



6 most similar 
{k = 18) 



368.45 sec. 
4.2GB 

fc = 18 



137.31 sec. 
1.8GB 

A; = 18 



212.59 sec. 
1.1GB 

A; = 18 



0.685 sec. 
22MB 

A; = 20 



n fc-SIMILAR PLANS (RESP. n fc-DIVERSE PLANS) 

Given an ASP program V that formulates a planning problem P, a distance measure 
All that maps a set of plans for P to a nonnegative integer, and two nonnegative 
integers n and k, decide whether a set 5 of n plans for P exists such that A/j (5*) < k 
(resp. AhiS) > k). 



We take V as the ASP formulation of the non-concuiTent Blocks World as in ( |Erdem| 
2002[ l to compute a plan of length at most / (Fig. B 14 in Appendix B i, together with an 



ASP description of the planning problem instance shown in Fig. |5] 
We define the distance Ah{S) of a set S of plans as follows: 

A,,(5) =max{7?^(Fi,F2) | Pi,P2 e S,\Pi\ < jPsI} 



based on the action-based Hamming distance Di^ defined in ( Srivastava et al. 2007| l to 



measure the distance between two plans. Intuitively, Dh{Pi, P2) is the number of differ- 
entiating actions in each time step of two plans Pi and ^2- More precisely: let us denote a 
plan X of length / by a function actx that maps every nonnegative integer i (1 < « < 
to the i'th action of the plan X, and let us denote by |X| the length of a plan X; then the 
Hamming Distance Dfi{Pi, P2) between two plans Pi and P2 such that < IP2I can 
be defined as follows: 

MPuP2) = \{i\actp^{i)^actp^{t),l< I < |Fi|}| + |P2|- li^il 



ASP formulations of the distance functions Dh and Ah{S) are presented in Fig.s B 16 
and |B 17| in [Appendix B| 

Consider, for instance, a planning problem in the Blocks World that asks for a plan of 
length less than or equal to 7. Consider two plans. Pi and P2, that are characterized by the 
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functions actp^ and actp^ respectively, as follows: 



actp^ (1) 


= moveop{bi, Table) 


actp^ (2) 


= moveop{h2, 61) 


actp^ (3) 


= moveop{bi, Table) 


actp^ (4) 


= moveop{b3, 62) 


actp^ (5) 


= moveop{hi, 63) 


actp^ (6) 


= moveop{b5, 64) 


actp^{l) 


= moveop{bi, Table) 


actp^{2) 


= moveop{b2, &i) 


actp,{3) 


= moveop{b4, 65) 


artp,(4) 


= moveop{b3, 62) 


actp^{5) 


= moveop{bi, Table) 


actp^ (6) 


= moveop{bi, 63) 


actp^{7) 


= moveop{b^, 64) 







The distance Dh{Pi, P2) between Pi and P2 is 4 since the actions at time steps 3, 5 and 6 
are different and P2 has an additional action (at time step 7). 

To be able to apply our Online Method 3 with clasp-nk to compute n fc-similar plans 
of length at most I, we define a heuristic function CBh to estimate a lower bound for the 
distance between a plan Pc and any plan-completion of a "partial" plan Pp. Intuitively, 
a partial plan consists of parts of a plan. Let us characterize a partial plan Pp by a partial 
function act p^ from { 1 ,•••,/} to the set of actions; that is, act p^ is a function from a subset 
of {1, • • •,/} to the set of actions. A plan-completion of a partial plan Pp is a plan Y of 
length I' (I' < I) for the planning problem P such that acty is an extension of actp^ to 
{1, •• •, /'}. Then we can define i2S;i(Pp, Pc) for a partial plan Pp and apian Pc as follows: 

CBh{Pp,Pc) = \{i I actp^{i) ^ actp^{i),i € domartp^,! < i < \Pc\}\ 
+ G dom act p^,\Pc\ < i < l}\ 

In the Blocks World example above, consider a partial plan Pp characterized by the 
function actp^ as follows: 

actp^{2) = moveop{b2, bi) flcfpp(4) = moveop{b2, 62) 
actp^{5) = moveop{b4, Table) actp^{7) = moveopifi^, 64) 

The lower bound CBh{Pp, Pi) for the distance between any completion of Pp and Pi is 
computed as follows: 

CBh{Pp,Pi) = \{i \actp^{i) ^ actp^{i),i G dom actp^,l < i < 6}\ 
+ \{i\i € dom actp^,6 < i < 7}| 
= |{5}| + |{7}|=2. 

One completion of Pp is P2. Note that £S;j(Pp, Pi) < D/((Pi, P2). Indeed, the following 
proposition expresses that jCBh does not overestimate the distance between Pc and any 
plan-completion X of Pp. 

Proposition 5 

For a partial plan Pp and a plan Pc for the planning problem P, CBh {Pp , Pc) is admissible. 

Similarly, to be able to apply our Online Method 3 with CLASP-NK to compute n k- 
diverse plans of length at most we define a heuristic function UBh{Pp, Pc) to estimate 
an upper bound for the distance between a plan P^. and any plan-completion of Pp-. 

UBh{Pp-,Pc) = ^-\{i \ actp^{i) =actp^{i),i G domactp^,! < i < \Pc\}\- 
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Fig. 5. Blocks World problem. 



For instance, for the partial plan Pp and Pi above, 

UBh{Pp, Pi) = 7 — I actp, {i) = actp^{i), i <E dom actp , 1 < « < 6}| 
= 7- |{2,4}| = 7- 2 = 5 

and UBh{Pp, Pi) > Dh{Pi, P2)- Indeed, the following proposition expresses that this 
upper bound function does not underestimate the distance between Pc and any plan- 
completion X of Pp. 

Proposition 6 

For a partial plan Pp and a plan Pc for the planning problem P, UBh{Pp, Pc) is admissi- 
ble. 

We performed some experiments with the ASP formulation, planning problem, and dis- 
tance measures above, to find 2 most similar plans, 2 most diverse plans, 3 most simi- 
lar plans, 3 most diverse plans, 6 most similar plans. Table |4] summarizes the results of 
these experiments. 

It can be observed that the planning problem in Fig. [5] has too many solutions to the 
problem (more than 50.000), and it is intractable to compute all of them in advance and 
then the distances between all pairwise solutions. Therefore, instead of computing all the 
solutions in advance, we compute a subset of them (around 200) which is small enough 
to construct a distance graph, and apply our Offline Method in this way. However, these 
200 solutions are not diverse enough, and thus, although we can find many very similar 
solutions, it is hard to find diverse solutions; for instance, we can find 6 1 -similar solutions 
but we can find only 3 6-diverse solutions. 

Online Method 1 performs the worst in comparison with the other online methods, as in 
our experiments with phylogeny reconstruction problems, due to the large ASP program 
(Fig. B 15 in Appendix B i used for computing n distinct plans. 



Online Method 2 is comparable with Online Method 3 in terms of computing similar 
solutions. After computing a solution, computing a 1-similar solutions has a very small 
search space and Clasp can find a similar solution in a short time. On the other hand, 
computing a 21 -diverse solution has a huge search space. Therefore, performance of com- 
puting diverse solutions with Online Method 2 is worse than that of Online Method 3. 
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Table 4. Computing similar/diverse plans for the blocks world problem. OM denotes "Out 
of memory." 

Problem Offline Method 



Reformulation 
(CLASP) 



Online Methods 
Iterative Comp. 
(CLASP, perl) 



Incremental Comp. 
(CLASP-NK) 



2 most similar 
(k = 1) 



OM 



6 min. 45 sec. 
106 MB 

k = 1 



6 min. 53 sec. 
73 MB 

k = 1 



7 min. 17 sec. 
Ill MB 

k = 1 



2 most diverse 
(k = 22) 



OM 



33 min. 28 sec. 
213 MB 

A; = 22 



1 1 min. 
73 MB 

k = 22 



7 min. 40 sec. 
112 MB 

fc = 21 



3 most similar 

{k = 1) 



OM 



7 min. 5 sec. 
141 MB 

k = 1 



7 min. 3 sec. 
73 MB 

k = 1 



7 min. 21 sec. 
112 MB 

k^2 



3 most diverse 
(k = 22) 



OM 



78 min 42 sec. 
333 MB 

A; = 22 



18 min. 49 sec. 
73 MB 

= 21 



12 min. 40 sec. 
167 MB 

k = 21 



6 most similar 

(k = 1) 



OM 



64 min. 42 sec. 
584 MB 

k = 1 



7 min. 32 sec. 
73 MB 

k = 1 



7 min. 18 sec. 
112 MB 

k = 2 



Online Method 3 deals with the Hamming distance computation at the search level. In 
addition, it does not restart the search process to compute a new solution; instead, it learns 
the conflicts caused by distance difference while computing a new solution and backtracks 
to approximate levels to compute similar/diverse solutions. Especially, for the computation 
of diverse solutions, such a search strategy creates a significant performance gain. 



8 Related Work 

Finding similar/diverse solutions has been studied in other areas such as propositional logic 



(Bailleux and Marquis 1999 1, constraint programming ( Hebrard et al. 20071 Hebrard et al 



2005| l, and planning ( Srivastava et al. 2007[ ). Let us biiefly desciibe related work in each 
area, and discuss the similarities and the differences compared with our approach. 



Related Work in Propositional Logic In ( Bailleux and Marquis 1999) , Bailleux and Mar- 
quis study the following problem 

DISTANCE-SAT 

Given a CNF formula E, a partial interpretation PI, and a nonnegative integer d, 
decide whether there is a model / of E such that / disagrees with PI on at most d 
atoms. 
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This problem is similar to fc-CLOSE SOLUTION in that it asks for a A; -close solution. On 
the other hand, it asks for a solution fc-close to a partial solution, whereas /c-CLOSE SO- 
LUTION asks for a solution that is fc-close to a set of previously computed solutions. Also, 
DISTANCE-SAT considers a distance measure (i.e., partial Hamming distance) computable 
in polynomial time; whereas fc-CLOSE SOLUTION considers any distance measure such 
that deciding whether the distance of a set of solutions is less than a given k is in NP. De- 
spite these differences, with S containing a single solution and A being a partial Hamming 
distance, fc-CLOSE SOLUTION becomes essentially the same as DISTANCE-SAT. 

As for the computational complexity analysis. Proposition 1 of ( |Bailleux and Marquis] 
|1999| l shows that in the general case DISTANCE-SAT is NP-complete. However, determin- 
ing whether E has a model that disagrees with a complete interpretation / on at most d 
variables, where (i is a constant, is in P. 

To solve DISTANCE-SAT, the authors propose two algorithms, DPdisiance and DPdistance+iasso- 
Our modification of clasp's algorithm is similar to the first algorithm in that both algo- 
rithms check whether a partial interpretation computed in the DPLL-like search obeys the 
given distance constraints. On the other hand, unlike DPaisumce^ CLASP also uses conflict- 
driven learning: when it learns a conflicting set of literals, it will never try to select them 
in the later stages of the search. DP distance+iasso offers manipulations while selecting a new 
variable: it creates a set of candidate variables with respect to the distance function, com- 
putes weights of these variables relative to the distance function, and selects one with 
the maximum weight. On the other hand, in SELECT, CLASP creates a set of candidate 
variables, and selects one of the candidates to continue the search. Using the idea of 
DPdisiance+iasso, wc Can modify CLASP further to manipulate the selection of variables with 
respect to the distance function. However, in the phylogeny reconstruction problem, since 
the domain of the distance function consists of the edge atoms which are far outnumbered 
by many auxiliary atoms, in SELECT the set of candidate variables generally consists of 
only auxiliary variables; due to these cases, the manipulation of the selection of variables 
is not expected to improve the computational efficiency. 

Related Work in Constraint Programming In ( |Hebrard et al. 2007{|Hebrard et al. 2005| l, the 
authors study various computational problems related to finding similar/diverse solutions. 



The main decision problems studied in (Hebrard et al. 2005 1 are the following: 



(iDlSTANTA;SET (resp. c/Close^Set) 

Given a polynomial-time decidable and polynomially balanced relation R over strings, 
a symmetric, reflexive, total and polynomially bounded distance function 6 between 
strings, and some string x, decide whether there is a set S of k strings (i.e., S C 

{y I {x, y) G R}) such that miny^^^s^iUi z) > d (resp. maxy zes^iv^ z) < d). 



which are similar to d-DlSTANT SET (resp. d-CLOSE SET): ( [Hebrard et al. 2005| l asks for a 



set of k solutions d-distant/close to one solution, whereas d-DlSTANT/CLOSE SET asks for 
a set of solutions that is d-close/distant to a set of previously computed solutions. Also, the 
distance measure considered in dDlSTANTfcSET (resp. rfCLOSE/iSET) is computable in 
polynomial time; in d-DlSTANT SET (resp. d-CLOSE SET) deciding whether the distance 
of a set of solutions is less than a given d is assumed to be in NP. 

The main decision problems studied in ( Hebrard et al. 2007] l are the following: 
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dDiSTANT (resp. dCLOSE) 

Given a constraint satisfaction problem P with variables ranging over finite domains, 
a symmetric, reflexive, total and polynomially bounded distance function 5 between 
partial instantiations of variables, and some partial instantiation p of variables of P, 
decide whether there is a solution s of P such that 6{p, s) > d i6{p, s) > d). 

which are similar to d-DlSTANT SOLUTION and d-CLOSE SOLUTION. On the other hand, 
( jHebrard et al. 2007 j asks for a solution d-close to a partial solution rather than a set 
of solutions. Also, the distance measure considered in these problems is computable in 
polynomial time. However, with S containing a single solution and A being computable in 
polynomial time, d-DlSTANT SOLUTION (resp. rf-CLOSE SOLUTION) becomes essentially 
the same as dDlSTANT (resp. dCLOSE). 

The authors also study some optimization problems related to these problems, similar to 
the ones that we study above. 

As for the computational complexity analysis of these problems, the authors find out that 
they are all NP-complete; these results comply with the ones presented in Section[4]subject 
to conditions under which the problems of (He brard et al. 2005||Hebrard et al. 2007| l above 
are equivalent to the problems we study in this paper. 



Considering partial Hamming distance as in (Bailleux and Marquis 1999 1, Hebrard et 
al. present an offline method (similar to our method) that applies clustering methods, and 
two online methods: one based on reformulation (similar to Online Method 1), the other 
based on a greedy algorithm (similar to Online Method 2) that iteratively computes a solu- 
tion that maximizes similarity to previous solutions. The computation of a /:-close solution 
is due to a Branch & Bound algorithm (similar to the idea behind Online Method 3) that 
propagates some similarity/diversity constraints specific to the given distance function. Our 



offline/online methods are inspired by these methods of (Hebrard et al. 2007 Hebrard et al. 
|2005] l. 

We note that partial Hamming distance not unrelated to the ones introduced for com- 



paring phylogenies in Section 6.1 one can polynomially reduce nodal distance to partial 
Hamming distance, and vice versa also partial Hamming distance to nodal distance of trees 
(allowing auxiliary atoms in the LP encoding). 



Related Work in Planning In ( Srivastava et al. 2001) , the authors study the following de- 
cision problem: 

(iDlSTANTA;SET (resp. rfCLOSEfcSET) 

Find a set S of k plans for a planning problem PP, such that miny ,,(zs5{y, z) > d 
(resp. maXy^^(,sS{y, z) < d). 

The authors study these problems considering domain-independent distance measures com- 
putable in polynomial time (such as Hamming distance or set difference). They present a 
method (similar to our Online Method 1), where they add global constraints to the underly- 



ing constraint satisfaction solver of the GP-CSP planner (Do and Kambhampati 2001 1. As 
another method they present a greedy approach (similar to our Online Method 2), where 
they add global constraints which forces solver to compute fc-diverse solutions in each it- 
eration until it computes n solutions. They also present a method (similar to our Online 
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Method 3) which modifies an existing planner's (Gerevini et al. 2003 1 heuristic function 
and computes n fc-similar solutions in the search level. 

Advantages of using ASP-Based Methods/Tools Our ASP-based methods for computing 
similar/diverse or close/distant solutions to a given problem have three main advantages 
compared to other approaches: 

• they are not restricted to some domain-independent distance function, like (partial) 
Hamming distance considered in all the methods/tools above; 

• depending on the particular ASP-based method, we can represent domain-independent 
or domain-specific distance functions in ASP or implement them in C-H-; 

• we can use the definitions of distance functions modularly, without modifying the 
main problem description or without modifying the search algorithm or the imple- 
mentation of the solver 

Thus, our ASP-based methods/tools for computing similar/diverse or close/distant solu- 
tions are applicable to various problems with different (often domain-specific) distance 
measures. 

In that sense, a user may prefer to use our ASP-based methods/tools for computing 
similar/diverse or close/distant solutions to a given problem, compared to the SAT-based 
methods/tools and the CP -based methods/tools, if the user considers a domain-specific 
distance function but does not want to modify the CP/SAT solvers to be able to compute 
similar/diverse or distant/close solutions. 

Also, our ASP-based methods/tools may be preferred when it is easier to represent the 
main problem in ASP, due to advantages inherited from the expressive representation lan- 
guage of ASP, such as being able to define the transitive closure. Some sample applications 



include phylogenetic network reconstruction ( Erdem et al. 2006 1 and wire routing ( Coban 



et al. 2008 Erdem and Wong 2004 1. 



ASP-based methods for computing similar/diverse or close/distant solutions to a given 
problem are probably most useful for existing well-studied ASP applications, such as 



phylogeny reconstruction ( Brooks et al. 2005 Brooks et al. 2007| ) or product configura 



tion ( Soininen and Niemela 1998|, to be used with domain-specific measures. 



9 Conclusion 

We have studied two kinds of computational problems related to finding similar/diverse so- 
lutions of a given problem, in the context of ASP: one problem asks for a set of n A;-similar 
(resp. fc-diverse) solutions; the other asks, given a set of solutions, for a fc-close (fc-distant) 
solution s. We have analyzed the computational complexity of these problems, and intro- 
duced offline/online methods to solve them. We have applied these offline/online methods 
to the phylogeny reconstruction problem, and observed their effectiveness in computing 
similar/diverse phylogenies for Indo-European languages. Similarly we have applied these 
methods to planning problems, and observed their effectiveness in computing in particu- 
lar diverse plans in Blocks World. Finally, we have compared our work with related ap- 
proaches based on other formalisms. 
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There are many appealing ASP applications for which finding similar/diverse solu- 
tions could be useful. In this sense, our methods and implementation (i.e., clasp-nk) 
can be useful for ASP. On the other hand, no existing phylogenetic system can com- 
pute similar/diverse phylogenies. In this sense, our distance functions, methods, and tools 
can be useful for phylogenetics. Similarly, no planner can compute similar/di verse plans 
with respect to a domain- specific measure, our methods and tools can be useful for plan- 
ning. In general, the ASP-based methods/tools can be useful for finding similar/diverse 
or close/distant solutions to a problem in two cases: representing the problem in ASP is 
easier (e.g., if the problem involves recursive definitions like transitive closure), or the dis- 
tance measure is different from the Hamming distance (implemented in the SAT/CP-based 
tools). 

We are also interested in combinations of the problems studied above (for instance, 
finding a phylogeny that is the most distant from a given set of phylogenies and that is 
the closest to another given set of phylogenies), and application of our methods to other 
problems. The study of these problems is left as a future work. 

Acknowledgments 

We are grateful to the reviewers of the paper as well as the reviewers of the preliminary 
conference version for their comments and constructive suggestions for improvement, in 
particular regarding the computation of nodal distances. 

References 

Bafna, v. and Pevzner, p. 1998. Sorting by transpositions. SIAM Journal of Discrete Matlie- 
matics 11, 224-240. 

Bailleux, O. and Marquis, P. 1999. DISTANCE-SAT: complexity and algorithms. In Proc. of 

AAAl. 642-647. 

Bluis, J. AND Shin, D.-G. 2003. Nodal distance algorithm: Calculating a phylogenetic tree com- 
parison metric. In Proc. ofBIBE. 87. 

BODENREIDER, O., ^OBAN, Z. H., DOGANAY, M. C, ERDEM, E., AND KO§UCU, H. 2008. A 

preUminary report on answering complex queries related to drug discovery using answer set pro- 
gramming. In Proc. of ALPSWS. 

Brooks, D. and McLennan, D. 1991. Phylogeny, Ecology, and Behavior: A Research Program 
in Comparative Biology. University of Chicago Press, Chicago, IL. 

Brooks, D. R., Erdem, E., Erdogan, S. T., Minett, J. W., and Ringe, D. 2007. Inferring 
phylogenetic trees using answer set programming. J. Autom. Reasoning 39, 4, 471-511. 

Brooks, D. R., Erdem, E., Minett, J. W., and Ringe, D. 2005. Character-based cladistics and 
answer set programming. In Proc. of PADL. 37-5 1 . 

Caldiran, O., Haspalamutgil, K., Ok, A., Palaz, C, Erdem, E., and Patoglu, V. 2009. 
Bridging the gap between high-level reasoning and low-level control. In Proc. of LPNMR. 

Camin, J. and Sokal, R. 1965. A method for deducing branching sequences in phylogeny. Evo- 
lution J9, 3, 311-326. 

Chafle, G., Dasgupta, K., Kumar, A., Mittal, S., and Srivastava, B. 2006. Adaptation 

in web service composition and execution. In Proc. oflCWS. 549-557. 
Chen, Z.-Z. and Toda, S. 1995. The complexity of selecting maximal solutions. Information and 

Computation 119, 231-239. 



Finding Similar/Diverse Solutions in Answer Set Programming 



37 



COBAN, E., Erdem, E., and Ture, F. 2008. Comparing ASP, CP, ILP on two challenging ap- 
plications: Wire routing and haplotype inference. In Proc. of the 2nd International Workshop on 
Logic and Search (LaSh 2008). 

DasGupta, B., He, X., Jiang, T., Li, M., Tromp, J., and Zhang, L. 1997. On distances 
between phylogenetic trees. In Proc. of SODA. 427^36. 

Davis, M., Logemann, G., and Loveland, D. 1962. A machine program for theorem-proving. 
Communications of the ACM 5, 394-397. 

Do, M. B. and Kambhampati, S. 2001. Planning as constraint satisfaction: Solving the planning 
graph by compiling it into CSP. Artificial Intelligence 132, 2, 151-182. 

Edwards, A. and Cavalli-Sforza, L. 1964. Phenetic and Phylogenetic Classification, 67-76. 

ElTER, T. AND SUBRAHMANIAN, V. 1999. Heterogeneous active agents, ii: Algorithms and com- 
plexity. Artif InteU. 108(1-2), 257-307. 

Erdem, E. 2002. Theory and applications of answer set programming. Ph.D. thesis. Department of 
Computer Sciences, University of Texas at Austin. 

Erdem, E. 2009. PHYLO-ASP: Phylogenetic systematics with answer set programming. In Proc. 
ofLPNMR. 567-572. 

Erdem, E., Lifschitz, V., and Ringe, D. 2006. Temporal phylogenetic networks and logic 
programming. Theory and Practice of Logic Programming 6, 5, 539-558. 

Erdem, E. and Wong, M. D. F. 2004. Rectilinear Steiner Tree construction using answer set 
programming. In Proc. oflCLP. 386-399. 

Felsenstein, J. 2009. Phylip. |http : / /evolution ■ genetics .Washington . edu/] 
Iphylip . htrnl] 

Ferraris, P. and Lifschitz, V. 2005. Weight constraints as nested expressions. Theory and 

Practice of Logic Programming 5, 45-74. 
Gebser, M., Kaufmann, B., Neumann, A., and Schaub, T. 2007a. clasp: A Conflict-Driven 

Answer Set Solver. In Proc. ofLPNMR. 260-265. 
Gebser, M., Kaufmann, B., Neumann, A., and Schaub, T. 2007b. T: Conflict-driven answer 

set solving. In Proc. oflJCAL 386-392. 
Gelfond, M. and Lifschitz, V. 1991. Classical negation in logic programs and disjunctive 

databases. New Generation Computing 9, 365-385. 
Gerevini, a., Saetti, a., and Serina, I. 2003. Planning through stochastic local search and 

temporal action graphs in Ipg. /. Artif. Int. Res. 20, I, 239-290. 
GUTIN, G. 2003. Handbook of Graph Theory. CRC Press, Chapter 5.3 Independent sets and cliques, 

389-402. 

Hamming, R. W. 1950. Error detecting and error correcting codes. Bell System Technical lour- 
nal 29, 2, 147-160. 

Hebrard, E., Hnich, B., O'Sullivan, B., and Walsh, T. 2005. Finding diverse and similar 

solutions in constraint programming. In Proc. of AAAI. 372-377. 
Hebrard, E., O'Sullivan, B., and Walsh, T. 2007. Distance constraints in constraint satis- 
faction. In Proc. ofllCAI 106-111. 
Hon, W.-K., Kao, M.-Y., and Lam, T.-W. 2000. Algorithms and Computation. Springer Berlin 

/ Heidelberg, Chapter Improved Phylogeny Comparisons: Non-shared Edges, Nearest Neighbor 

Interchanges, and Subtree Transfers, 369-382. 
Kautz, H. a. and Selman, B. 1992. Planning as satistiabihty. In Proc. ofECAL 359-363. 
KUHNER, M. AND FELSENSTEIN, J. 1994. A simulation comparison of phylogeny algorithms 

under equal and unequal evolutionary rates [published erratum appears in mol biol evol 1995 

may;12(3):525]. Mol Biol Evol 11, 3 (May), 459-468. 
Lifschitz, V. 1999. Action languages, answer sets and planning. In The Logic Programming 

Paradigm: a 25-Year Perspective. Springer Verlag, 357-373. 



38 



Eiter et. al. 



LiFSCHITZ, V. 2008. What is answer set programming? In Proc. of. AAAI. 1594-1597. 
LiFSCHiTZ, V. 2010. Thirteen definitions of a stable model. In Fields of Logic and Computation. 

488-503. 

LiFSCHITZ, v.. Tang, L. R., and Turner, H. 1999. Nested expressions in logic programs. Annals 

of Mathematics and Artificial Intelligence 25, 369-389. 
Marek, W. and Remmel, J. 2003. On the expressibiUty of stable logic programming. Theory 

and Practice of Logic Programming 3, 4-5, 55 1-567. 
Marques-Silva, J. AND Sakallah, K. 1999. A search algorithm for prepositional satisfiability. 

IEEE Trans. Computers 5, 506-521. 
MclLRAiTH, S. A. AND SON, T. C. 2002. Adapting Golog for composition of semantic web 

services. In Proc. ofKR. 482-496. 

NOGUEIRA, M., BALDUCCINI, M., GELFOND, M., WATSON, R., AND BARRY, M. 2001. An a- 

prolog decision support system for the space shuttle. In Proc. ofPADL. Springer- Verlag, London, 
UK, 169-183. 

Nye, T. M., Lio, p., AND GiLKS, W. R. 2006. A novel algorithm and web-based tool for comparing 

two alternative phylogenetic trees. Bioinformatics 22, 1 (January), 117-119. 
Papadimitriou, C. 1994. Computational Complexity. Addison-Wesley. 

RiNGE, D., Warnow, T.. and Taylor, A. 2002. Indo-European and computational cladistics. 

Transactions of the Philological Society 100, 1, 59-129. 
Robinson, D. F. and Foulds, L. R. 1981. Comparison of phylogenetic trees. Mathematical 

Biosciences 53, 1-2 (February), 131-147. 
Simons, P., Niemela, I., and Soininen, T. 2002. Extending and implementing the stable model 

semantics. Artificial Intelligence 138, 181-234. 
Soininen, T. and Niemela, 1. 1998. Developing a declarative rule language for applications in 

product configuration. In Proc. ofPADL. 305-319. 
Son, T., Pontelli, E., and S akama, C. 2009. Logic programming for multiagent planning with 

negotiation. In Proc. oflCLP. 99-114. 
Srivastava, B., Nguyen, T. A., Gerevini, A., Kambhampati, S., Do, M. B., and Serina, 

I. 2007. Domain independent approaches for finding diverse plans. In Proc. oflJCAI. 2016-2022. 
SwoFFORD, D. L. 2003. PAUP*: Phylogenetic analysis under parsimony (and other methods). 

version 4.0. Sinauer Associates, Sunderland, Mass. 
White, J. and O'Connell, J. 1982. A Prehistory of Australia, New Guinea, and SahuL Aca- 
demic, San Diego, CA. 



Finding Similar/Diverse Solutions in Answer Set Programming 



39 



Appendix A Proofs of Theorems 

Proof of Theorem^ 

Membership: Consider a non-deterministic Turing machine M, operating on input V, 
(resp. M^), n, and k, which guesses 5* as a set {si, . . . , s„} of n interpretations over the 
alphabet of V, together with a potential witness w for a computation of (resp. M^) of 
length polynomial in n. After that, for 1 < i < n, M checks whether Si is an answer set 
of V and whether all Si represent distinct solutions of the problem. It rejects if any of these 
tests does not succeed. Otherwise, M proceeds by verifying that w is a witness of 
(resp. M^) given S and k as its input. If so, then M accepts, otherwise it rejects. Since n 
is polynomial in the size of the input to M, this also holds for the guess of M. Moreover, 
the subsequent computation of M, i.e., the tests carried out, can be accomplished in time 
polynomial in n. Therefore, M is a non-deterministic Turing machine which decides n k- 
SIMILAR SOLUTIONS (respectively n fc-DlVERSE SOLUTIONS) in polynomial time, which 
proves NP-membership for these problems. 

Hardness: Let (p = Ai<i<i be a Boolean formula in conjunctive normal form (CNF) 
over variables B = {bi, . . . , b^}, i-C-, each Ci is a clause over variables from B. By x 
we denote the complement of a literal x, i.e., x = ^b if x = b, and x = b if x = ^b. 
This notation is extended to clauses in the obvious way: c = Si A ... A x;^ for a clause 
c — XiW . . . W xi^. 

Consider the normal logic program V ~ {bi ^ not nbi ; nbi not bi \ 1 < i < 
m} U {•<— I 1 < i < /}, where c' denotes the conjunction obtained from c by replacing 
negative literals ^x in c by nx (and using ',' instead of 'A'). It is easily verified (and well- 
known) that V has an answer set iff (p is satisfiable (and that every answer set of V is in 
1-to-l correspondence with a satisfying assignment of 4> in the obvious way). 

Given V, consider the n A;-SIMILAR SOLUTIONS (respectively n /c-DlVERSE SOLU- 
TIONS) problem, where n = 1, k = 0, and for any set 5* of answer sets of V, the distance 
measure A is defined by A (5") = 0. Note that A is normal and computable in constant 
time. Then, there exists a solution to the problem iff there exists a set S of answer sets of V 
such that \S\ = 1, i.e., V has an answer set. Since V has an answer set iff (f) is satisfiable, 
this proves NP-hardness of the n fc-SlMlLAR SOLUTIONS (respectively n fc-DlVERSE SO- 
LUTIONS) problem. Note that this argument holds for any normal A. □ 

Proof of Theorem^ 

Membership: Consider a non-deterministic Turing machine M, operating on input V, 
(resp. M^), a set S of solutions given by answer sets of V, and k. It guesses an interpreta- 
tion s over the alphabet of V (which is polynomial in the size of V), together with a poten- 
tial witness u; for a computation of (resp. Af^) of length polynomial in |S'| + |s|+log fc. 
After that, M checks whether s is an answer set of V and whether it represents a solution 
different from all solutions in S. It rejects if any of these tests does not succeed. Other- 
wise, M proceeds by verifying that w is a witness of (resp. M^) on input 5" U {s} 
and k. If so, then M accepts, otherwise it rejects. The guess of M is polynomial in its input 
and the subsequent computation of M, i.e., the tests carried out, can be accomplished in 
polynomial time. Therefore, M is a non-deterministic Turing machine which decides k- 
CLOSE SOLUTION (respectively fc-DlSTANT SOLUTION) in polynomial time, which proves 
NP-membership for these problems. 
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Hardness: Consider the normal logic program given in the proof of Theorem[T] and the 
fe-CLOSE SOLUTION problem, where S — %,k and for any set 5" of answer sets of V, 
the distance measure A is defined by A(S") = 0. Note that A is normal and computable 
in constant time. Then, there exists a solution to the problem iff there exists a set S' of 
answer sets of V such that 5" ^ 0, i.e., V has an answer set, which proves NP-hardness of 
the fc-CLOSE SOLUTION problem. Similarly, the fc-DlSTANT SOLUTION problem, where 
S = %, k = 0, and A (5") — 0, has a solution iff V has an answer set. Moreover, the above 
arguments hold for any normal A. This proves the claim. □ 

Proof of Theorem^ 

Membership: The problem of computing the cardinality of a maximal solution S of size 
at most n is an optimization problem for a problem in NP such that the optimal value can 
be represented using log n bits. Let Mopt be an oracle for this problem, and consider a 
non-deterministic Turing machine M' , with output tape operating on input V, (resp. 
M^), and k. Initially, M' calls Mopt with V, (resp. M^), and k as input to compute 
the maximum cardinality c of a set of solutions S such that 15*1 < n. Then, M' proceeds 
like the nondeterministic Turing machine M in the proof of Theorem[T|using n — c, addi- 
tionally writing the guessed solution S to its output tape. Since the latter is accomplished 
in time polynomial in c, M' is a non-deterministic Turing machine with output tape that 
consults an oracle once for computing the optimal value of an optimization problem solv- 
able in NP. Thus, M' is in FNP//log and decides MAXIMAL n fc-SlMlLAR SOLUTIONS 
(respectively MAXIMAL n fc-DIVERSE SOLUTIONS). 

Hardness: We reduce X-MinModel to the problem of computing MAXIMAL n fc-SlMlLAR 
SOLUTIONS. X-MinModel is the following FNP //log-complete problem: Given a Boolean 
formula in CNF as in the proof of Theorem[T] and a subset X of B, compute an X-minimal 
model of 0, i.e., a satisfying assignment for (j>, which is subset minimal among all satis- 
fying assignments for cj) on the variables from X which are assigned true. We identify a 
truth assignment with the set of Boolean variables that are assigned true, and for a truth 
assignment s, we use s\x to denote its restriction to variables from X. 

Consider the normal logic program given in the proof of Theorem[T] and the MAXIMAL 
n fc-SlMlLAR SOLUTIONS problem, where n — \X\, k — 0, and A is defined as follows. 
For a given set S of answer sets of 7-", such that \S\ > 0, we set A(S') = if for every pair 
of answer sets si, S2 in S, either si\x C S2|x, or S2|x C si|x- Otherwise (and if 5" = 0), 
A(>S') = 1. Note that A is computable in polynomial time, performing less than 2n^ checks 
for proper containment, where 151 = n. Observe also that the answer sets in a set 5* such 
that A{S) ~ 0, can be strictly ordered wrt. subset inclusion on their restrictions to X, 
and that \X\ is the maximum cardinality for such a set of answer sets. Given S such that 
A (5) = 0, let si denote the minimal answer set in S wrt. subset inclusion on the restriction 
to X. The following is trivial: the MAXIMAL n fc-SlMlLAR SOLUTIONS problem above has 
a solution iff 4> is satisfiable. By the problem definition, it also holds for every solution S 
of the problem that A(5') = 0. 

We show that if 5 is a solution of the MAXIMAL n fc-SlMlLAR SOLUTIONS problem 
given above, then si is an X-minimal model of (p. Towards a contradiction assume that 
there exists a satisfying assignment s' for 4>, such that s'\x C Si\x- Consider sq — s' U 
{nb I b £ B,b ^ s'}. Since s' satisfies (p, it holds that sq is an answer set of V. Moreover 
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So ^ S, since sqIx C si\x and si is the minimal answer set in S wrt. subset inclusion on 
the restriction to X. As a consequence, S'U{so} ^ and A(;SU{so}) = 0. However, since 
the latter also implies |S' U {so}| < n, this contradicts our assumption that is a solution 
of the MAXIMAL n /c- SIMILAR SOLUTIONS problem above. We have thus shown that no 
satisfying assignment s' for </> exists, such that s'\x C si\x, i.e., that si is an X-minimal 
model of 0. This completes the reduction of X-MinModel to the problem of computing 
MAXIMAL n A;-SIMILAR SOLUTIONS, proving FNP//log-hardness. 

For a reduction of X-MinModel to the problem of computing MAXIMAL n fc-DlVERSE 
SOLUTIONS, we simply swap the values of A and define: A(S') = 1 if j^j > and for 
every pair of answer sets si, S2 in 5", either si\x C S2\x,o^ S2\x C si\x- Otherwise (and 
if = 0), A(S') = 0. FNP//log-hardness follows by analogous arguments. □ 

Proof of Theorem^ 

Membership: Consider a deterministic Turing machine M', with output tape and an oracle 
for NP-problems, which operates on input V, (resp. M^), and n. Initially, M' pre- 
pares an integer ki of n bits with the less significant half of bits set to 1 and the remaining 
bits set to 0. Then, M' successively uses its oracle operating as the nondeterministic Turing 
machine M in the proof of Theorem [ij starting with input V, (resp. M^), n, and ki, 
performing a binary search for an optimal k. After that, M' once more uses its oracle like 
the nondeterministic Turing machine M in the proof of Theorem[T] but additionally copy- 
ing the solution S guessed by the oracle to its output tape. Since the latter is accomplished 
in time polynomial in n, and since a polynomial number of calls to the oracle is sufficient 
to complete the binary search, M' is in FP^^^and decides n MOST SIMILAR SOLUTIONS 
(respectively n MOST DIVERSE SOLUTIONS). 

If the value of A(S') is polynomial in the size of S, then the problem of computing the 
maximal value of A(S') over all solutions S is an optimization problem for a problem in 
NP such that the optimal value can be represented using log n bits. Let Mopt be an oracle 
for this problem, and consider a non-deterministic Turing machine M" with output tape 
operating on input V, (resp. M^), and n. Initially, M" calls Mopt with V, (resp. 
M^), and n as input to compute the value k for A(S') of an optimal solution S. Then, M" 
proceeds like the nondeterministic Turing machine M in the proof of Theorem [T| addi- 
tionally writing the guessed solution S to its output tape. Since the latter is accomplished 
in time polynomial in n, M" is in FNP//log and decides n MOST SIMILAR SOLUTIONS 
(respectively n MOST DIVERSE SOLUTIONS). 

Hardness: We reduce the Traveling Salesman Problem (as, e.g., in ( [Papadimitriou 1994| ) 
to the problem of computing n MOST SIMILAR SOLUTIONS. Consider m cities 1, . . . , m, 
and a non-negative integer distance dij between any two cities i and j. The task is to 
compute a tour visiting all cities once (i.e., a Hamilton Cycle) of shortest length. 

For a reduction, consider V — { Pij ^ not npij; npij ^ not pij; r, ^ pi,j; 
^ not rj\i^ j} U p^j,pkj; ^ Pt,j,Pi,k \ i ^ j,i ^ kj ^ fc}, where indices i, j, 
and k range over 1, . . . , m. Every answer set s of 7^ uniquely corresponds to a Hamilton 
Cycle encoded by the atoms pij in ,s, and every permutation of the cities gives rise to 
exactly one answer set of V. This can easily be verified observing that the first two rules 
encode a nondeterministic guess of a set of atoms pi j . The third and fourth rule are satisfied 
iff 'every city is reached', i.e., if for every index j there exists an index i, such that pi j is 
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true. The last two rules are satisfied iff every city 'is reached from at most one different 
city' and 'reaches at most one different city', i.e., iff for different indices i, j, and k, pij 
and pkj cannot both be true, as well as pi,j and pi^k cannot both be true. 

Given this program, consider the n MOST SIMILAR SOLUTIONS problem, where n — 1, 
and for any set S of answer sets of V, the distance measure A is defined by A (5") — 
Uses'^Pi j^s dij. Note that A is monotonic and thus computable in polynomial time in 
the size of 5'. Moreover, 5 is a solution to this problem iff, by its definition, S contains 
exactly one answer set s of V, and iff A (5") is minimal among all sets of answer sets 
of V, thus in particular among elementary such sets. By the definition of A, this implies 
that S — {s} is a solution iff s encodes a Hamilton Cycle of minimal cost. This proves 
FP^P-hardness for the n MOST SIMILAR SOLUTIONS problem in general. 

For a reduction of TSP to the problem of computing n MOST DIVERSE SOLUTIONS, 
consider A'(5') = m x raaxd — ^{S), where maxd is the maximum distance dij given. 
Also A' is monotonic and computable in polynomial time, and by analogous arguments 
FP^P -hardness follows for the n MOST DIVERSE SOLUTIONS problem in the general 
case. 

If the value of A(5') is polynomial in the size of S, then FNP//log-hardness is obtained 
by a reduction of X-MinMod: Let V be the normal logic program in the proof of Theo- 
rem[T] and consider the n MOST SIMILAR SOLUTIONS problem, where n = 1, and A(5) 
is given by the minimal (respectively maximal) partial Hamming distance on X between 
an answer set s G 5' and (respectively X). It is a straightforward consequence of the 
definition of A, that if 5 = {s} is a solution to this n MOST SIMILAR SOLUTIONS prob- 
lem (respectively to this n MOST DIVERSE SOLUTIONS problem), then s is an X-minimal 
model of 4> (cf. also the proof of Theorem|3]l. □ 

Proof of Theorem^ 

Membership: Consider a deterministic Turing machine M' , with output tape and an oracle 
for NP-problems, which operates on input V, (resp. M^), and 5*. Initially, M' pre- 
pares an integer ki of n bits with the less significant half of bits set to 1 and the remaining 
bits set to 0. Then, M' successively uses its oracle operating as the nondeterministic Tur- 
ing machine M in the proof of Theorem|2j starting with input V, (resp. M^), S, and 
fci, performing a binary search for an optimal k. After that, M' once more uses its oracle 
like the nondeterministic Turing machine M in the proof of Theorem |2] but additionally 
copying the answer set s guessed by the oracle to its output tape. Since the latter is accom- 
plished in time polynomial in n, and since a polynomial number of calls to the oracle is 
sufficient to complete the binary search, M' is in FP^^and decides CLOSEST SOLUTION 
(respectively MOST DISTANT SOLUTIONS). 

If the value of A(5') is polynomial in the size of a set S of n solutions, then the problem 
of computing the maximal value of A{S U {.s}) for any solution 5* U {s} is an optimiza- 
tion problem for a problem in NP such that the optimal value can be represented using 
logarithmically many bits in the size of 5* U {s}. Let Mopt be an oracle for this problem, 
and consider a non-deterministic Turing machine M" with output tape operating on input 
V, (resp. M^), and S. Initially, M" calls Mopt with V, (resp. M^), and S as 
input to compute the value k for A{S U {s}) of an optimal solution S U {s}. Then, M" 
proceeds like the nondeterministic Turing machine M in the proof of Theorem|2] addition- 
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ally writing the guessed answer set s to its output tape. Since the latter is accomplished 
in time polynomial in the input, M" is in FNP//log and decides CLOSEST SOLUTION 
(respectively MOST DISTANT SOLUTIONS). 

Hardness: The respective lower bounds are an immediate consequence of the problem 
reductions in the proof of the previous Theorem |4] Just observe that for given V and A, 
the solutions of an n MOST SIMILAR SOLUTIONS problem with input n — 1 coincide with 
the solutions of the CLOSEST SOLUTION problem with input 5 = 0, (and the same holds 
for the problem n MOST DIVERSE SOLUTIONS with input n = 1 and MOST DISTANT 
SOLUTION with input S — 0). It thus suffices to recall that the reductions mentioned above 
are reductions to problems with input n = 1. □ 

Proof of Theorem^ 

Membership: Consider a non-deterministic Turing machine M, operating on input V, M^, 
M^, a set S of answer sets of V, and k. Let n be the size of its input. First, M guesses 
S', such that is polynomial in n, as a set {sj , . . . , s'„} of interpretations over the 
alphabet of V, two integers ki and k2 in binary representation of size at most polynomial 
in n, together with two potential witnesses wi and of and of length polynomial in 
\S\ and respectively, as well as two potential witnesses W2 and W4 of of length 
polynomial in 1 5*1 and respectively. After that, M checks whether S' is different from 
S, as well as whether s'^ is an answer set of T', for 1 < i < m. It rejects if any of these 
tests does not succeed. Otherwise, M proceeds by verifying that wi is a witness of 
on input S and ki, that ■W2 is a witness of on input S and ki, as well as that W3 is 
a witness of on input S' and kz, and that W4 is a witness of on input S' and 
fc2- If either test fails M rejects, otherwise it checks whether |fci — ^2] < k (respectively 
1^1 — fel > k), and if so accepts, otherwise it rejects. Note that due to our assumptions 
that the size of 5" to consider is polynomial in n, and that the value of A(S'), respectively 
A(S") is bounded by an exponential in the size of S, respectively in the size of 5", the 
guess of M, which is polynomial in n, is sufficient for deciding the problem. Moreover, the 
subsequent computation of M, i.e., the tests carried out, can be accomplished in polynomial 
time. Therefore, M is a non-deterministic Turing machine which decides fc-CLOSE SET 
(respectively fc-DlSTANT SET) in polynomial time, which proves NP-membership for these 
problems. 

Hardness: Consider the normal logic program given in the proof of Theorem[T] and the 
fc-CLOSE SET problem, where 5* = 0, fc = 0, and for any set 5" of answer sets of V, 
the distance measure A is defined by A(S") = 0. Note that A is normal and computable 
in constant time. Then, there exists a solution to the problem iff there exists a set S" of 
answer sets of P such that S' ^ 0, i.e., V has an answer set, which proves NP-hardness of 
the fc-CLOSE SET problem. Similarly, the fc-DlSTANT SET problem, where S — i), k = 0, 
and A (5") = 0, has a solution iff V has an answer set. Again the arguments hold for any 
normal A, which proves the claim. □ 

Proof of Proposition [7] 

In order to compute D„(i-'i, P2), we need to perform ('2') nodal distance computations 
where \L\ is the number of leaves. The nodal distance NDp{x, y) between leaves x and y 
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in a phylogeny P can be computed as 

NDp{x, y) — depthp(x) + depthp{y) — 2 x depth p (lea p(x, y)), 

where leap{x, y) is the lowest common ancestor of x and y in P. Note that, if depthp{v) 
for all vertices u in P is given (which is computable in 0(|i|) time, as P is a binary tree), 
the computation of NDp{x, y) takes constant time if leap{x, y) is known. Then, comput- 
ing A©p(2;, y) for all leaves x, ?/ in P is possible in 0(|Lp) time. In a standard post-order 
traversal of P, a called node v always fulfills v — leap{x, y) for any vertices x, y that 
occur in different subtrees rooted at children of v. Thus, if each call returns all leaves of P 
reached from v (which has overall cost we can calculate in the traversal M)( a;, y) 

for all leaves x,y of P in the setting above. In total, the time to compute NDp^{x^ y) 
and NDp^{x,y), for aW x,y e L, is 2x 0{\L\) + 0{\L\^) = 0(|Lp). Therefore, in total 
Dn{Pi, P2) is computable in 0(|Lp) time. □ 

Proof of Proposition [2] 

Let V be the number of vertices in one tree, then v"^ is an upper bound for the number 
of the pairs that we can compare their descendants. Therefore, we have at most O(w^) 
comparisons. 

Since the number of descendants is bounded by \L\ (after obtaining the descendants of 
each vertex by preprocessing in 0{v-\L\) time), each comparison takes time 0(|L|). 

Since t; = 2 x |L| — 1, Di{Pi, P2) can be computed in (2 x \L\ — 1)^ x \L\ steps which 
isO(|L|3). □ 

Proof of Proposition [i] 

Let Sp be a set of all completions of the partial phylogeny Pp. For every P e 5^, we need 
to prove that 

£B^{Pp,P,)<D^{P,P,) 

holds. 

Let Pi e argminpgs^ {Dn{P, Pc)) be a completion with smallest distance. Then it will 
be enough to prove that 

CBn{Pp,Pc) < DniPi,P,) 

holds. If we replace CBn and _D,j with their equivalents, the inequality will look like the 
following: 

J2 \NDpXx,y)-NDp^ix,y)\< ^ {NDp^x, y) ~ NDp^{x, y)\ 

We can break the right hand side summation into two for Lp and L\Lp as follows: 
j:.,yeLjNDpSx,y)~NDp^{x,y)\< 

^.,yeL, \NDpXx, y)-NDp^{x, y)\ + E(..,)eL^\L^ WDp^x, y)-NDp^{x, y)\ 

The distance between x and y is the same for Pp and P; where x,y G Lp. Therefore, terms 
cancel each other and we have the following: 



0< \NDpXx,y)-NDp^{x,y)\ 

{x,y)<^L^\Ll 
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Since the right hand side is a summation of absolute values, the inequality holds which 
completes the proof. □ 

Proof of Proposition |?] 

Let 5p be a set of all completions of the partial phylogeny Pp. For every f e Sp,we need 
to prove that 

uBn{Pp,Pc) > Dn{p,p,y 

Let Pu E arg maxpg^' {Dn{p, Pc)) be a completion at largest distance. Then it will be 
enough to prove that 

UBn{Pp,Pc) > Dn{Pu,Pc)- 

If we replace UBn and D„ with their definition, the inequality is 

^ \NDpXx,y)-NDp^{x,y)\ + {(^\-{^^A)xl> ^ \NDpXx,y)-NDpXx,y)\- 

We can break the right hand side summation into two for Lp and L\L,p as follows: 

E.,yeL, \NDpA^. y)-NDp^ix,y)\ + (Q {^Y)) x ; > 

E.,,eL, \NDpAx, y)-NDp^{x, y)\ + E.,yeL\L, WDp^x, y)-NDpSx, y)\ 

The distance between x and y is same for Pp and Pu where x,y e Lp. Terms cancel each 
other: 




The maximum nodal distance in a tree is equal to the number of leaves; therefore, each 
term in the right hand side of the inequality is at most /. Since, there are ((2) — ('^''')) 
terms in the right hand side summation, ((2) — C^"')) x Z is greater than or equal to the 
summation. □ 

Proof of Proposition [5] 

Take any plan-completion X of the partial plan Pp. Consider two cases. 

Case 1: \X\ < \Pc\- Our goal is to prove that 

CBh{Pp,P,) <Dh{X,P,y 
By the definition of D/j, the distance between X and Pc is: 

Dh{X, Pc) = \{i I actx{i) ^ actpji), l<t< \X\}\ + |P,| - \X\- 

Since X is a plan-completion of Pp and |X| < \Pc\, dom actp,^ C dom actpj then, by the 
definition of £Bh' 

CBh{Pp,Pc) = |{j I actp^{i) ^ actp^{i),i G domactp^}\- 

Since X is a plan-completion of Pp, 

{i I actp^{i) actp^{i), i G dom acfp^} C {i | actx{i) 7^ actp^{i), 1 < i < \X\}- 
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Hence, 

CBhiPp, Pc) < \{^ I actxii) ^ actp^t), l<i< \X\}\ + \Pc\ - \X\ = Dh{X, P,)- 

Case 2: \X\ > \Pc\- Our goal is to prove that 

CBh{Pp,Pc)<Du{Pc.xy 

By the definition of Dh, the distance between X and Pc is: 

Dk{Pc, X) = \{i I actx{i) + actp^ii), l<i< |P,|}| + |X| - 

By the definition of CBh- 

CBh{Pp:Pc) = |{* I actp^ii) ^actp^{i),i € domactp^,! < i < \Pc\}\ + 
\{i \ I > i > \Pc\,i <^ dom actp^}\- 

Since X is a plan-completion of Pp, actx extends actp^, and then 

{i I actp^{i) 7^ actp^i), i S dom actp^, 1 < i < \Pc\} 
C {i I actx{i) ^ actp^i), l<i< \X\}- 

Since |X| > \Pc\, 

\X\-\Pc\ > |{« \\X\>i> \Pc\, I £ domacfpjl ^\{i\l>i> \Pc\, i e domflcfpJ|- 
Hence, 

CBhiPp, Pc) < |{* I actxii) ^ actpXi), 1 < * < \X\}\ + \X\ - |P,| = Dh{Pc. X)- 
□ 

Proof of Proposition [6| 

Take any plan-completion X of partial plan Pp. Consider two cases. 

Case 1: \X\ < \Pc\- Our goal is to prove that 

UBh{Pp,Pc) > Dh{X,Pc) 

where 

UBh{Pp, Pc) = / — |{i I actp (i) = actp^{i), i E dom actp }|, 

D,,{X,Pc) = |{^ I actxii)"^ «cfp^(^),\ < ^ < |X|}| + |P,| - \X\- 

Since |X| < \Pc \ and X is a plan-completion of Pp, the set 

{i I actp^ (i) = actp^ * G dom actp^} 

does not intersect with the set 

r = {z I acfx(j) 7^ actp^{i), l<i< \X\} U {i| |X| < i < 

Then 

{1, • • •, /} \ {« I actp^ (i) — actp^ * G dom acfp^^} 
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is a superset of Y. Therefore, 

UBh{Pp,Pc) = l- \{i\ actp^{i) actp^{i), i e dom actp^}\ 

> \{i I actxit) + actpXi). 1 < i < l^lll + \Pc\ - \A 

Case 2: \X\ > \Pc\- Our goal is to prove that 

UBh{Pp,Pc) > Dh{Pc,X) 

where 

UBh{Pp, Pc) = ^ — |{* I actp^{i) = actp^i), i e dom actp^, I < i < \Pc 
DhiPc, X) = \{i I actxit) actp^i), l<i< \Pc\}\ + \X\ - \Pc\- 

Since |X| > \Pc \ and X is a plan-completion of Pp, the set 

{i I actp^{i) = actp^i), i G dom actp^, I < i < \Pc\} 
does not intersect with the set 

Y = {i\ actx{i) ^ actp^i), l<i< \Pc\} U {i\ \Pc\ < i < \X\}- 

Then 

{1, •••,/} \ {i I actp^i) = actp^i), i G domactp^,! < i < \Pc\} 
is a superset of Y. Therefore, 

UBh{Pp, Pc) = I - \{i I actp^i) = actp^i), i € dom actp^,l < i < \Pc 
> \{i I actxii) ^ actpSi), l<i< \Pc\}\ + \X\ - \Pc\ 

= Dk{Pc,xy 

□ 
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Appendix B ASP Formulations 

c{clique(X) : vertex (X)}c. 

:- clique (X) , clique (Y) , vertex (X), vertex (Y), X!=Y, not edge(X,Y), 
not edge ( Y, X) . 

Fig. B 1. ASP formulation of the c-clique problem (a clique of size c). 



solution (1 . .n) . 

cfclique (S, X) : vertex(X)}c :- solution(S). 

:- clique(S,X), clique(S,Y), not edge(X,Y), not edge(Y,X), X!=Y. 
different (SI, S2) :- clique (SI, X) , clique (S2, Y) , SI != S2, X != Y. 
:- not different (SI, S2) , solution (SI; S2) , S1!=S2. 

Fig. B 2. ASP formulation that computes n distinct c-cliques. 



same (SI, S2, V) :- clique (SI, V) , clique ( S2 , V) , solution ( SI ; S2 ) , 

vertex (V) , SI < 82 . 
hammingDistance (SI, S2, c-H) :- H { same (SI , S2 , V) : vertex(V)}H, 

solution (SI ; S2 ) , maximumDistance (H) , SI < S2 . 

Fig. B 3. ASP formulation of the Hamming distance between two cliques. 



:- solution (SI ; S2 ) , hammingDistance (SI , S2 , H) , H > k, 
maximumDistance (H) . 

Fig. B 4. A consttaint that forces the distance among any two solution is less than or equal 
to k. 
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% generate a rooted binary tree 

vertex(0. .2*k) . root (2*k) . 

internal (X) vertex (X) , not leaf (X) . 

2 {edge(X,Y) : vertex (Y) : X > Y} 2 :- internal (X) 

reachable (X, Y) :- edge(X,Y), vertex (X; Y) , X > Y. 
reachable (X, Y) :- edge(X,Z), reachable (Z, Y) , 

X > Z, vertex (X; Y; Z) . 
:- vertex(Y), not reachable (X, Y) , root (X) , Y != X. 
:- reachable (X, X) , vertex (X) . 

maxY(X,Y) :- edge(X,Y), edge(X,Yl), Y > Yl, 

vertex (Y;Y1) , internal (X) . 
:- maxY(X,Y), maxY(Xl,Yl), Y > Yl, X < XI, 

vertex (Y; Yl) , internal (X; XI ) . 



Fig. B 5. The phylogeny reconstruction program of Brooks et. al.: Part 1. 
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% ensure that the tree does not have more than x incompatible characters 
gO(X, I,S) :- f(X, I,S), informative_character (I) , 

assent ial_state ( I , S) . 
gO(Y,I,S) :- gO(X,I,S), gO(Xl,I,S), edge(Y,X), edge(Y,Xl), 

X>X1, internal (Y) , vertex (X; XI ) , inf ormative_character ( I ) , 

essential_state (I, S) . 

marked (X, I) :- gO(X,I,S), inf ormative_character ( I ) , 
vertex (X) , essential_state (I, S) . 

g(X,I,S) :- gO(X,I,S), inf ormative_character ( I ) , 

vertex (X) , essent ial_state ( I , S ) . 
{g(X,I,S): essential_state ( I , S ) } 1 :- internal (X) , 

not marked(X,I), inf ormative_character ( I ) . 

{root_is (X, I, S) } :- g(X,I,S), vertex (X) , 

inf ormative_character ( I ) , essential_state (I, S) . 

:- root_is (X, I, S) , root_is (Y, I, S) , 

vertex {X;Y), X < Y, inf ormative_character ( I ) , 

essential_state (I, S) . 
:- root_is (X, I, S) , g(Y,I,S), reachable (Y, X) , Y > X, 

vertex (X; Y) , inf ormative_character ( I ) , 

essential_state (I, S) . 

reachable_is (X, I, S) :- root_is (X, I, S) , 

vertex (X) , inf ormative_character { I ) , essential_state (I, S) . 
reachable_is (X, I , S) :- g(X,I,S), reachable_is (Z, I, S) , 

edge(Z,X), Z > X, vertex (X;Z), inf ormative_character ( I ) , 

essential_state (I, S) . 

incompatible (I) :- g(X,I,S), not reachable_is (X, I, S) , 

vertex (X) , inf ormative_character ( I ) , essential_state (I, S) . 
:- n+1 { incompatible ( I ) : inf ormative_character ( I ) } . 



Fig. B 6. The phylogeny reconstruction program of Brooks et. al.: Part 2. 
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% generate n rooted trees 

solution ( 1 . . n) . 

vertex (0..2*k). root(2*k). 

internal (X) :- vertex (X) , not leaf (X) . 

2 {edge(N,X,Y) : vertex(Y) : X > Y} 2 :- internal (X) , solution(N). 

reachable (N, X, Y) :- edge(N,X,Y), vertex (X; Y) , X > Y, solution(N). 
reachable (N, X, Y) :- edge(N,X,Z), reachable (N, Z , Y) , solution(N), 

X > Z, vertex (X; Y; Z) . 
:- vertex(Y), not reachable (N, X, Y) , root (X) , Y != X, solution (N) . 
:- reachable (N, X, X) , vertex (X) , solution(N). 

maxY(N,X,Y) :- edge(N,X,Y), edge (N, X, Yl ) , Y > Yl, 

vertex (Y; Yl ) , internal (X) , 3olution(N). 
:- maxY (N, X, Y) , maxY (N, XI , Yl ) , Y > Yl, X < XI, 

vertex (Y; Yl) , internal (X; XI ) , solution (N) . 



Fig. B 7. A reformulation of the phylogeny reconstruction program of Brooks et. al. (Fig- 
ures B 5 and B 6 1, to find n distinct phylogenies: Part 1. 
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% ensure that no tree has more than x incompatible characters 
gO{N,X,I,S) :- f(X, I,S), inf ormative_character ( I ) , 

essential_state (I, S) , solution (N) . 
gO(N,Y,I,S) :- gO (N,X, I, S) , gO (N, XI , I , S ) , edge (N, Y, X) , edge (N, Y, XI ) , 

X>X1, internal (Y) , vertex (X; XI ) , inf ormative_character ( I ) , 

essential_state (I, S) , solution (N) . 



marked (N, X, I ) :- gO(N,X, I,S), inf ormative_character ( I ) , 
vertex (X) , essent ial_state ( I , S ) , solution (N) . 



g(N,X,I,S) :- gO(N,X,I,S), inf ormat ive_character ( I ) , 
vertex (X) , essent ial_state ( I , S ) , solution (N) . 

{g(N,X, I,S): essential_state (I, S) } 1 :- internal (X) , 

not marked (N, X, I ) , inf ormative_character ( I ) , solution (N) . 



{root_is (N, X, I, S) } :- g(N,X,I,S), vertex (X) , 

inf ormat ive_character ( I ) , essent ial_state ( I , S ) , solution (N). 

root_is (N, X, I, S) , root_is (N, Y, I, S) , 

vertex (X;Y), X < Y, inf ormative_character ( I ) , 

essential_state (I, S) , solution (N) . 
:- root_is (N, X, I, S) , g(N,Y,I,S), reachable (N, Y, X) , Y > X, 

vertex (X;Y), inf ormative_character ( I ) , essential_state ( I , S ) , 

solution (N) . 



reachable_is (N, X, I, S) :- root_is (N, X, I , S ) , 

vertex (X), inf ormative_character ( I ) , essential_state ( I , S ) , 
solution (N) . 

reachable_is (N, X, I, S) : - g (N, X, I , S ) , reachable_is (N, Z , I , S ) , 
edge(N,Z,X), Z > X, vertex (X;Z), inf ormative_character ( I ) , 
essential_state (I, S) , solution (N) . 

incompatible (N, I ) :- g(N,X,I,S), not reachable_is (N, X, I , S ) , 
vertex (X), inf ormative_character ( I ) , essential_state ( I , S ) , 
solution (N) . 

:- x+1 { incompatible (N, I ) : inf ormative_character ( I ) } , solution (N) . 



Fig. B 8. A reformulation of the phylogeny reconstruction program of Brooks et. al. (Fig- 
ures [ 



B 5 and B 6 1, to find n distinct phylogenies: Part 2. 



% make sure that these n trees are distinct 



different (SI, S2) :- edge (SI, XI, Y) , edge ( S2 , X2 , Y) , 

vertex (X2; XI; Y) , solut ion ( SI ; S2 ) , SI != S2, XI != X2 . 
:- not different (SI, S2) , solut ion ( SI ; S2 ) , SI != S2 . 



Fig. B 9. A reformulation of the phylogeny reconstruction program of Brooks et. al., to 
find n distinct phylogenies: Part 3. 



Finding Similar/Diverse Solutions in Answer Set Programming 
dist (0 . .m) . 

% compute the nodal distances using distance_v. 

% nodaldistance (S, X, Y, T) : the nodal distance between X and Y 
% in the S'th tree is T. 

nodaldistance (S, X, Y, T) :- tempnodaldistance (S, X, Y, T) , 

not notminnodal (S, X, Y, T) , solution(S), leaf(X;Y), dist (T) . 

% distance_v (S, X, Y, T) : the distance between the vertex X and 
% its descendant Y is T in the S'th tree. 

distance_v (S, X, Y, 1) edge{S,X,Y), vertex(X;Y), solution{S). 

distance_v (S, X, Z, T+1) :- distance_v (S, X, Y, T) , edge(S,Y,Z), 
vertex (X;Y; Z) , dist (T) , solution (S). 

% length of a path between vertices X and Y 
tempnodaldistance (S, X, Y, T1+T2) :- distance_v (S, CA, X, Tl) , 

distance_v(S,CA,Y,T2) , X<Y, dist (Tl ; T2 ) , leaf(X;Y), 

vertex (CA), solution (S). 

notminnodal (S, X, Y, Tl) tempnodaldistance (S, X, Y, Tl) , 
tempnodaldistance (S, X, Y, T2) , T2 < Tl, leaf(X;Y), 
dist(Tl;T2), solution{S). 

% compute the differences of nodal distances of each pairs of 

% leaves in each pairs of trees. 

diffnodal (P1,P2,X, Y,abs (D1-D2) ) :- nodaldistance (PI , X, Y, Dl ) , 
nodaldistance (P2,X,Y,D2) , P2>P1, leaf(X;Y), dist (Dl ; D2 ) , 
solution (P1;P2) . 

% compute the distance between each pairs of trees. 

% distance_t (PI , P2 , T) : the distance between (trees) PI and P2 is 
tempdistance (PI, P2, 0, 1, D) :- diffnodal (PI, P2 , 0, 1, D) , 

solution (P1;P2) , dist (D) . 
tempdistance (PI, P2, LI, L2,D+K) :- tempdistance (PI, P2 , LI , L2-1 , D) , 

diffnodal (PI, P2, LI, L2,K) , L2-L1 > 1, solution (PI ; P2 ) , 

leaf (LI; L2) , dist(D;K). 
tempdistance (PI, P2, LI, L2, D+K) :- tempdistance (P 1 , P2 , Ll-1 , k, D) , 

diffnodal (PI, P2, LI, L2,K) , L2 = Ll + 1, L2 > 1, solution (PI; P2 ) , 

leaf(Ll;L2), dist(D;K). 

distance_t (PI, P2, T) :- tempdistance (PI , P2 , k-1 , k, T) , dist (T) , 
solution (P1;P2) . 

Fig. B 10. A formulation of the nodal distance function in ASP. 
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dist (0 . .m) . 

% at each solution N, define reachability of leaf Y from X 
reachableN (N, X, Y) :- edge(N,X,Y), vertex (X) , leaf (Y) , X > Y, 
solution (N) . 

reachableN (N,X,Y) : - edge (N, X, Z ) , reachableN (N, Z, Y) , 
solution (N), X > Z, vertex (X;Z), leaf (Y) . 

% at each solution S, assign depths to vertices Y 

depth (S, 2*k, 0) :- solution(S). 

depth (S,Y,T+1) :- depth (S, X, T) , edge(S,X,Y), 

vertex (X;Y) , depthRange (T) , solution (S), T<r. 

% vertices VI and v2 have different descendants 

dif f (Nl, V1,N2,V2) :- solution (Nl; N2 ) , vertex (VI ; V2 ) , leaf(X), 
Nl < N2, reachableN (Nl, VI, X) , not reachableN (N2 , V2 , X) . 

diff (N1,V1,N2, V2) :- solution (Nl ; N2 ) , vertex (VI ; V2 ) , leaf (X) , 
Nl < N2, not reachableN (Nl, VI, X) , reachableN (N2 , V2 , X) . 

% definition of the function f 

fN(Nl,Vl,N2,V2, 1) :- diff (N1,V1,N2, V2) , solution (Nl ; N2 ) , 

vertex (VI; V2) , Nl < N2 . 
fN(Nl,Vl,N2,V2, 0) :- not dif f (Nl , VI , N2 , V2 ) , solution (Nl ; N2 ) , 

vertex (VI; V2) , Nl < N2 . 

% definition of the function g 
gN(0,Nl,N2, 0) :- solution (Nl ; N2 ) , Nl < N2 . 

gN(D+l,Nl,N2,Dl) :- gN (D, Nl , N2 , X) , solution (Nl ; N2 ) , Nl < N2, 
depthRange (D; Y) , dist (Z; Dl ; X) , maxdepth2 (Nl, N2 , Y) , D<Y, 
depthV2 (Nl,N2,D+l,2*k,Z) , w(D+l,M), D1=X+M*Z. 

Fig. B 11. An ASP fonnulation of the descendant distance function Di for two phyloge- 
nies: Part 1 
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% depthV2 computes the summation of f (x,y) over all x,y at the same depth 
samedepth (N1,V1,N2,V2,D) :- depth (Nl , VI , D) , depth (N2 , V2 , D ) , 

vertex (VI; V2) , solution (Nl ; N2 ) , Nl < N2, depthRange (D ) . 

depthV (Nl, N2, D, W, 0, Z) :- solution (Nl ; N2 ) , Nl < N2, depthRange (D) , 

vertex(W), samedepth (Nl, W, N2 , 0, D) , fN (Nl, W,N2, 0, Z) , dist (Z) . 

depthV(Nl,N2,D,W, 0, 0) :- solut ion (Nl ; N2 ) , Nl < N2, depthRange (D) , 
vertex(W), not samedepth (Nl , W, N2 , , D) . 

depthV (Nl, N2, D, W, X+1, Z+Zl) :- solut ion (Nl ; N2 ) , Nl < N2, 
depthRange (D) , vertex (W) , depthV (Nl, N2, D,W, X, Z) , 
samedepth (N1,W,N2, X+1, D) , f N (Nl , W, N2 , X+1 , Zl ) , dist{Z;Zl), 
vertex (X) , X<2*k . 

depthV (N1,N2,D,W, X+1, Z) :- solution (Nl ; N2 ) , Nl < N2, depthRange (D) , 
vertex (W) , depthV (Nl, N2, D, W, X, Z) , 

not samedepth(Nl,W,N2,X+l,D) , dist(Z;Zl), vertex (X) , X<2*k. 

depthV2 (N1,N2,D, 0, Z) :- solution (Nl ; N2 ) , Nl < N2, depthRange (D) , 

depthV (N1,N2,D, 0, 2*k, Z) , dist (Z) . 
depthV2 (N1,N2,D,X+1, Z+Zl) :- solution (Nl ; N2 ) , Nl < N2, 

depthRange (D) , vertex (X) , depthV2 (Nl , N2 , D, X, Z ) , 

depthV (Nl, N2, D, X+1, 2*k, Zl) , dist(Z;Zl), X<2*k. 

% definition of the distance function D_n for two phylogenies 
depth2 (N1,N2,X) :- depth (Nl , Yl , X) , depth (N2 , Y2 , X) , 

vertex (Y1;Y2) , depthRange (X) , solution (Nl ; N2 ) , Nl < N2 . 
maxdepth2 (N1,N2,X) :- depth2 (Nl, N2, X) , not depth2 (Nl , N2 , X+1 ) , 

depthRange (X) , solution (Nl ; N2 ) , Nl < N2 . 

distance_t (N1,N2,X) :- gN (D, Nl , N2 , X) , solution (Nl ; N2 ) , Nl < N2, 
dist(X), depthRange (D) , maxdepth2 (Nl , N2 , D) . 

Fig. B 12. An ASP fonnulation of the descendant distance function Di for two phyloge- 
nies: Part 2 

% distance of a set of phylogenies 

notmaxdistance_t (PI, P2, Tl) :- distance_t (PI, P2, Tl) , distance_t (P3, P4, T2) , 

Tl < T2, solution (P1;P2;P3;P4) , dist (Tl ; T2 ) . 
delta (Tl) :- distance_t (PI , P2 , Tl ) , not notmaxdistance_t (PI, P2, Tl) , 

solution (P1;P2) , dist{Tl). 

% constraints on the distance function, for similarity 
:- delta(T), dist (T) , T > k. 

Fig. B 13. An ASP formulation of the distance function Ajj for a set of phylogenies, and 
the constraints for fc-similarity. 
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% effect of moving a block 

on(B,L,Tl) :- block (B), location (L), moveop (B, L, T) , next(T,Tl). 

% a block can be moved only when it's clear 
:- location (L), block (B), block (Bl), time (T) , 
moveop (B, L, T) , on(Bl,B,T). 

% any two blocks cannot be on the same block at the same time 
:- 2{on(Bl,B,T) :block(Bl) }, time (T) , block (B) . 

% wherever a block is, it's not anywhere else 
non(B,Ll,T) :- time(T), locat ion ( LI ) , location(L), block (B) , 
on(B,L,T), not eq(L,Ll). 

% every block is supported by the table 

supported (B, T) :- block(B), time (T) , on (B, table, T) . 

supported (B, T) block(B), block(Bl), time(T), on(B,Bl,T), 

supported (Bl, T) , not eq(B,Bl). 
:- block (B), time(T), not supported (B, T) . 

% no concurrency 

:- 2 {moveop (B, L, T) :block(B) :location(L) },time(T) . 
% inertia 

on(B,L,Tl) :- location(L), block (B) , on(B,L,T), not non(B,L,Tl), 

next (T, Tl) . 

% initial values and actions are exogenous 

l{non (B, L, 0) ,on (B,L, 0) }1 :-block(B), location(L). 

:- non(B,L,T), on(B,L,T), block (B), location (L), time(T). 

{moveop (B, L, T) } :- block(B), location(L), time(T), T < lasttime. 

% auxiliary predicates 
time ( . . lasttime ) . 

next(T,T+l) :- time(T), It (T, lasttime) . 

location (L) :- block (L) . 
location (table) . 

goal :- time (T) , goal (T) . 
:- not goal. 

Fig. B 14. Blocks world formulation. 
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solution ( 1 . . n) . 

% effect of moving a block 
on(S,B,L,Tl) :-blook(B), location(L), 

moveop (S, B, L, T) , next(T,Tl), solution(S). 

% a block can be moved only when it's clear 
:- location (L), block (B), block (Bl), timed), 
moveop (S, B, L, T) , on (S, Bl, B, T) , solution(S). 

% any two blocks cannot be on the same block at the same time 
:- 2{on(S,Bl,B,T) :block(Bl) }, timed), block (B) , solution(S). 

% wherever a block is, it's not anywhere else 

non (S, B, LI, T) :-time(T), location (LI ) , location (L) , block(B), 
on(S,B,L,T), not eq(L,Ll), solution(S). 

% every block is supported by the table 

supported (S, B, T) :- block(B), timed), on (S, B, table, T) , solution(S). 
supported(S,B, T) :-block(B), block(Bl), time (T) , on (S, B, Bl, T) , 

supported (S, Bl, T) , not eq(B,Bl), solution(S). 
:- block(B), timed), not supported (S, B, T) , solution(S). 

% no concurrency 

: - 2 {moveop (S, B, L, T) : block (B) : location (L) } , time (T) , solution (S) . 
% inertia 

on(S,B,L,Tl) :- location (L) , block(B), on(S,B,L,T), 
not non (S, B, L, Tl) , next(T,Tl), solution(S). 

% initial values and actions are exogenous 

1 {non (S, B, L, 0) , on (S, B, L, 0) } 1 :-block(B), location(L), solution(S). 

:- non (S, B, L, T) , on(S,B,L,T), block(B), location (L) , time (T) , solution(S). 

{moveop (S, B, L, T) } :- block (B), location (L), time (T) , T < lasttime, solution 

% auxiliary predicates 
time ( . . lasttime ) . 

next(T,T + l) :- timed). It (T, lasttime) . 

location (L) :- block (L) . 
location (table ) . 

goal(S) :- time (T) , goal(S,T), solution (S). 
:- not goal(S), solution (S). 

% compute distinct plans 

different (SI, S2) :- time (T) , moveop ( SI , X, Y, T) , not moveop (S2 , X, Y, T) , 

solution (SI; S2) , block(X), location(Y), SI < S2 . 
different (SI, S2) :- time (T) , not moveop (SI, X, Y, T) , moveop ( S2 , X, Y, T) , 

solution (SI; S2) , block(X), location(Y), SI < S2 . 
:- not different (SI, S2) , solution (SI ; S2 ) , SI < S2. 



Fig. B 15. A reformulation of the Blocks World program shown in Fig. B 14 to compute 
n distinct plans. 
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% for every time step T, check that the T'th actions 

% of Plans PI and P2 are different: 

different (PI, P2, T) moveop (PI, X, Y, T) , not moveop (P2 , X, Y, T) , 

time(T), solution (P1;P2) , block(X), location (Y) , PI < P2. 

different (Pi, P2, T) :- not moveop (PI, X, Y, T) , moveop (P2, X, Y, T) , 
time(T), solution (P1;P2) , block (X), location (Y) , PI < P2. 

% and define the hamming distance between two plans PI and P2 
% in terms of these differences: 

hammingdistance (PI, P2, H) :- H{dif ferent (PI, P2, T) : time(T)}H, 
solution (PI; P2) , distRange (H) , PI < P2 . 

Fig. B 16. An ASP formulation of the Hamming distance Df^ for two plans. 

somedistance (H) :- hammingdistance (PI, P2, H) , 

solution (P1;P2) , distRange (H) . 
notmaxdistance (HI ) :- somedistance (HI ) , somedistance (H2 ) , 

H2 > HI, distRange (HI; H2) . 
totaldistance (H) :- not notmaxdistance (H) , 

distRange (H) , somedistance (H) . 



Fig. B 17. An ASP formulation of the distance A/, for a set of plans. 



